Skip to main content

Overview

Stats.CollectAzureChinaCDNLogs is a .NET Framework 4.7.2 console application that runs as a scheduled NuGet job. Its sole purpose is to bridge the gap between the China-region Azure Front Door CDN log format (newline-delimited JSON) and the tab-separated OutputLogLine format consumed by the rest of the NuGet download-statistics pipeline. Each execution leases up to 4 blob files from a China Azure Blob Storage container, parses each line as an AfdLogLine JSON object, maps the AFD-specific fields to canonical stats columns, and writes gzip-compressed output blobs to a global (non-China) Azure Blob destination.
The job defaults to a 4-hour execution timeout (14400 seconds). A secondary “force-stop” cancellation fires 60 seconds after the primary timeout to ensure the process terminates even if StreamReader/StreamWriter calls do not honour the cancellation token (a known limitation of .NET Framework—fixable on .NET 6+).

Role in the NuGetGallery System

Stats Pipeline Ingestion

Feeds China CDN download events into the global stats pipeline so that NuGet.org package download counts include traffic served through the Azure China sovereign cloud.

Cross-Cloud Bridge

Authenticates to China storage via SAS token and to global destination storage via Managed Identity, acting as a secure cross-cloud data relay.

Format Normalisation

Converts AFD JSON log lines into the standard W3C-style TSV OutputLogLine format shared by all other CDN log collectors in the stats subsystem.

Windows Service Deployment

Packaged as a NuGet .nuspec and installed as a Windows service via NSSM (Non-Sucking Service Manager) using PowerShell pre/post-deploy scripts.

Key Files and Classes

FileClass / TypePurpose
Program.csProgramEntry point; creates a Job instance and hands it to the shared JobRunner.
Job.csJobExtends JsonConfigurationJob; wires up DI, creates AzureStatsLogSource and AzureStatsLogDestination, and runs ChinaStatsCollector.TryProcessAsync.
ChinaStatsCollector.csChinaStatsCollectorExtends the shared Collector base; overrides TransformRawLogLine (JSON → OutputLogLine) and VerifyStreamAsync (validates the first 10 lines of a blob).
AfdLogLine.csAfdLogLineMinimal POCO wrapping DateTime Time and AfdProperties Properties, deserialized from each JSON log line.
AfdProperties.csAfdPropertiesFlat POCO with ~35 string properties representing all fields emitted by Azure Front Door access logs (HTTP method, status code, cache status, TLS details, client/origin IPs, etc.).
Configuration/CollectAzureChinaCdnLogsConfiguration.csCollectAzureChinaCdnLogsConfigurationSettings POCO bound via IOptionsSnapshot<T>; controls source/destination connection strings, container names, file prefix, timeout, and output formatting flags.
Scripts/Functions.ps1PowerShell helpers Install-NuGetService / Uninstall-NuGetService that wrap NSSM to register or remove the Windows service.
Stats.CollectAzureChinaCDNLogs.nuspecPackaging manifest that bundles the net472 binary output alongside the four deployment scripts.

Dependencies

NuGet Packages (resolved transitively via project references)

PackageRole
Azure.Storage.BlobsBlob source/destination clients (BlobServiceClient).
Azure.IdentityManagedIdentityCredential and DefaultAzureCredential for passwordless auth to global storage.
AutofacIoC container wired up inside the NuGet.Jobs host framework.
Microsoft.Extensions.ConfigurationJSON configuration loading via JsonConfigurationJob.
Microsoft.Extensions.DependencyInjectionService registration in ConfigureJobServices.
Microsoft.Extensions.LoggingStructured logging throughout all classes.
Microsoft.Extensions.OptionsIOptionsSnapshot<T> configuration binding.

External Assembly Reference (not on NuGet)

AssemblyLocationRole
ICSharpCode.SharpZipLib 0.86.0external/ICSharpCode.SharpZipLib.0.86.0/GZip compression for output blobs (used inside the shared Collector base). Referenced via HintPath.

Internal Project References

ProjectRole
NuGet.Jobs.CommonProvides JsonConfigurationJob, JobRunner, StorageMsiConfiguration, and the ConfigureStorageMsi extension.
Stats.AzureCdnLogs.CommonProvides the Collector base class, ILogSource, ILogDestination, AzureStatsLogSource, AzureStatsLogDestination, OutputLogLine, ContentType, and AzureBlobLeaseManager.

Configuration Reference

PropertyTypeDefaultDescription
AzureAccountConnectionStringSourcestring(required)SAS-authenticated connection string to China Blob Storage.
AzureAccountConnectionStringDestinationstring(required)Connection string (or blob endpoint URI) for the global destination storage.
AzureContainerNameSourcestring(required)Source blob container holding raw AFD JSON log files.
AzureContainerNameDestinationstring(required)Destination blob container for processed gzip output.
DestinationFilePrefixstring(required)Prefix prepended to every output filename.
ExecutionTimeoutInSecondsint?14400 (4 h)Primary cancellation deadline; process is force-killed 60 s later.
WriteOutputHeaderbooltrueWhether to write a TSV column-header line at the top of each output blob.
AddSourceFilenameColumnboolfalseAppends the source blob filename as an extra column in the output.
RenameOutputFileboolfalseWhen true, embeds the blob’s LastModified timestamp in the output filename.

Notable Patterns and Quirks

Azure SDK SAS token bug workaround. Both Job.cs and ValidateAzureBlobServiceClient contain an explicit string replacement of SharedAccessSignature=?SharedAccessSignature= to work around Azure SDK issue #44373. The fix appears in two places in the same method scope.
Dual authentication strategy. The source (China) always uses a SAS token connection string because Managed Identity is not available in the Azure China sovereign cloud. The destination (global) uses Managed Identity when UseManagedIdentity = true, falling back to SAS tokens otherwise. In DEBUG builds, DefaultAzureCredential is substituted to support developer workstations without a full MSI setup.
Status code format difference. Global CDN logs emit cacheStatus/httpStatusCode as a single combined field. China AFD logs provide these separately, so ChinaStatsCollector constructs the combined string manually: CacheStatus + "/" + HttpStatusCode. Several downstream stats fields (filesize, sport, rsbytes, timetaken) have no AFD equivalent and are hard-coded to "0" or "na".
Stream verification before processing. Before consuming a blob, ChinaStatsCollector.VerifyStreamAsync reads up to the first 10 non-empty lines and attempts JSON deserialization. If none parse successfully the blob is rejected, preventing corrupt or misrouted files from polluting the output pipeline.
TLS metadata packed into xeccustom1. SSL protocol, cipher, and elliptic curve are concatenated into the xeccustom1 output column as "SSL-Protocol: ... SSL-Cipher: ... SSL-Curves: ...", mirroring the convention used by the global CDN collectors in Stats.AzureCdnLogs.Common.
CancellationToken propagation limitation. The Run() method documents that StreamReader.ReadLineAsync and StreamWriter.WriteLineAsync in .NET Framework 4.x do not accept a CancellationToken, making graceful mid-stream cancellation unreliable. The force-stop CTS (+60 s) exists as a safety net. This is acknowledged tech debt that would be resolved by migrating to .NET 6+.