Stats.AzureCdnLogs.Common is the foundational shared library for NuGet’s CDN-log statistics pipeline. It provides everything a consumer needs to:
Parse raw W3C-format CDN access logs produced by Azure CDN edge servers.
Collect those log blobs from an Azure Storage source container.
Transform each log line — stripping PII (client IP), filtering non-200 responses, and re-emitting sanitised lines.
Deliver the processed blobs to a destination Azure Storage container, with GZip compression support.
Lease-manage blobs during processing so that competing worker instances cannot double-process the same file.
The library targets net472 and is consumed by multiple stats worker jobs (e.g., Stats.CollectAzureCdnLogs, Stats.ImportAzureCdnStatistics) that form the download-count pipeline feeding nuget.org’s public statistics.
This library sits between raw CDN output and all downstream statistics consumers. It owns the wire format contract (W3C columns → CdnLogEntry) and the blob-lifecycle contract (acquire lease → read → write → archive/deadletter → release).
POCO mapping every W3C column from a CDN log line (timestamps, IPs, byte counts, user-agent, custom field).
CdnLogEntryParser.cs
CdnLogEntryParser
Static parser: splits a space-delimited W3C line, populates a CdnLogEntry, filters 404s and non-2xx status codes, converts Unix epoch timestamps.
CdnLogCustomFieldParser.cs
CdnLogCustomFieldParser
Regex-based parser for the x-ec_custom-1 column which carries NuGet-specific request/response headers.
W3CParseUtils.cs
W3CParseUtils
Low-level tokeniser: splits a log line on spaces while respecting double-quoted fields; treats - and "-" as null sentinels.
PackageStatistics.cs
PackageStatistics
Domain model for a single package download event, produced after enriching a CdnLogEntry with custom-field data.
ToolStatistics.cs
ToolStatistics
Parallel model tracking which NuGet client tool/version was used in a download.
Collect/Collector.cs
Collector (abstract)
Orchestrates the full collect loop: enumerate unlocked blobs, lock each, stream-transform, write output, then archive or deadletter. Subclasses implement TransformRawLogLine and VerifyStreamAsync.
Interface: single TryWriteAsync method; takes an input stream and a transform Action<Stream,Stream>.
Collect/AzureStatsLogSource.cs
AzureStatsLogSource
ILogSource over Azure Blob Storage; auto-creates -archive and -deadletter sibling containers for post-processing cleanup.
Collect/AzureStatsLogDestination.cs
AzureStatsLogDestination
ILogDestination over Azure Blob Storage; skips write if destination blob already exists; supports plain-text and GZip output.
AzureHelpers/AzureBlobLeaseManager.cs
AzureBlobLeaseManager
Acquires a 60-second renewable blob lease and spawns a background Task that renews it every 40 seconds until cancelled or released.
AzureHelpers/AzureBlobLockResult.cs
AzureBlobLockResult
Disposable result container: holds the BlobClient, lease ID, BlobProperties, and a linked CancellationTokenSource that fires if the renewal task fails.
Storage abstractions re-used by AzureBlobLeaseManager.
ICSharpCode.SharpZipLib is referenced twice: once as a local hint-path DLL from external/ICSharpCode.SharpZipLib.0.86.0/ and once as a NuGet PackageReference to SharpZipLib. This dual-reference can cause version conflicts during builds if the hint-path and package versions diverge.
AzureBlobLeaseManager acquires a 60-second Azure blob lease and keeps it alive via a fire-and-forget background Task that renews every 40 seconds. If renewal fails, the linked CancellationTokenSource on AzureBlobLockResult is cancelled, propagating cancellation to the active read/write operation automatically.
Archive / deadletter pattern
After processing, AzureStatsLogSource.TryCleanAsync moves the source blob to either a -archive container (success) or a -deadletter container (failure), then deletes the original. Both sibling containers are created on demand.
PII scrubbing in Collector
Collector.GetParsedModifiedLogEntry replaces the client IP (c-ip) column with a literal dash before writing the output line. This is enforced in the base class and cannot be bypassed by subclasses.
Abstract transform pipeline
Collector is abstract. Subclasses implement TransformRawLogLine(string) : OutputLogLine and VerifyStreamAsync(Stream) : Task<bool>, letting different jobs share blob-collection infrastructure with custom per-line logic.
AzureStatsLogSource skips blobs whose lease status is not Unlocked but does not retry them in the same batch. A blob locked by a crashed worker will remain unavailable until Azure auto-expires the lease (up to 60 seconds). Design consumers to run on a recurring schedule to recover such blobs.
CdnLogEntryParser handles two historical CDN status-code formats:
Global CDN: TCP_MISS/200 (cache status + slash + HTTP code)
China CDN (legacy): bare HTTP code such as 200
Both formats are filtered to exclude non-2xx responses. If the format is unrecognised the entry is passed through rather than dropped, preserving statistics flow at the cost of a small error margin.
The x-ec_custom-1 custom field is parsed by CdnLogCustomFieldParser using a single compiled regex that extracts key-value pairs. Duplicate keys in the CDN configuration are silently overwritten (last value wins) to avoid crashing the statistics job.