Skip to main content

Overview

ArchivePackages is a headless .NET Framework 4.7.2 console job that provides incremental backup of NuGet package binaries (.nupkg files). On each execution it:
  1. Reads a cursor timestamp from a cursor.json blob stored in the destination container.
  2. Queries the Gallery SQL database for all packages whose Published or LastEdited timestamp is newer than the cursor.
  3. For each such package, triggers a server-side blob copy from the primary packages container into an ng-backups container on the destination storage account.
  4. Advances the cursor to max(LastEdited, Published) across the batch.
The job optionally replicates the same set of packages to a secondary destination account, providing a redundant offsite backup.
The job only initiates the blob copy — it calls StartCopyAsync and does not wait for Azure Storage to complete the server-side transfer. Any process relying on a fully-consistent backup must account for in-flight copies.

Role in System

Gallery SQL DB  ──►  ArchivePackages Job  ──►  Primary Blob Account (ng-backups)

                              └──────────────►  Secondary Blob Account (ng-backups)

Source

Azure Blob Storage — packages container. Blob name format: {id}.{version}.nupkg (lowercased).

Destination

Azure Blob Storage — ng-backups container. Blob name format: packages/{id}/{version}/{url-encoded-hash}.nupkg.

Cursor

cursor.json stored in the destination container. Contains a single ISO-8601 cursorDateTime key that advances after each successful batch.

Deployment

Packaged as a NuGet package via .nuspec; installed as a Windows Service using NSSM via Octopus Deploy PowerShell scripts.

Key Files and Classes

FileClass / TypePurpose
ArchivePackages.Job.csJobMain job logic — reads cursor, queries DB, drives the archive loop for primary and optional secondary destinations
ArchivePackages.Program.csProgramEntry point; constructs Job and hands off to JobRunner.Run()
Configuration/InitializationConfiguration.csInitializationConfigurationStrongly-typed config POCO bound from the Initialization JSON section
JobEventSource.csJobEventSourceETW event source (Outercurve-NuGet-Jobs-ArchivePackages) emitting structured events for every major operation
PackageRef.csPackageRefPlain data class representing a Dapper query row — holds Id, Version, Hash, LastEdited, Published
Scripts/Functions.ps1PowerShell helpers Install-NuGetService / Uninstall-NuGetService used by deploy scripts

Dependencies

NuGet Package References

PackagePurpose
WindowsAzure.StorageAzure Blob Storage client — ICloudBlobContainer, StartCopyAsync, UploadFromStreamAsync

Internal Project References

ProjectKey Contributions
NuGet.Jobs.CommonJsonConfigurationJob base class, JobRunner, StorageHelpers (blob name formatting), GalleryDbConfiguration

Transitive / Shared Dependencies

LibraryRole
AutofacDI container
DapperMicro-ORM for the SQL query against Packages / PackageRegistrations
Microsoft.Extensions.ConfigurationJSON configuration loading with KeyVault secret injection
Newtonsoft.JsonParsing and updating cursor.json via JObject

Configuration Reference

{
  "Initialization": {
    "Source": "<storage-account-name>",
    "SourceContainerName": "packages",
    "PrimaryDestination": "<primary-backup-storage-account-name>",
    "SecondaryDestination": "<optional-secondary-backup-account>",
    "DestinationContainerName": "ng-backups",
    "CursorBlob": "cursor.json"
  }
}

Notable Patterns and Implementation Details

Cursor-based incremental processing. The cursor is a single cursorDateTime value persisted inside cursor.json in the destination container itself. A freshly created destination container requires a manually seeded cursor.json before the job will run successfully.
Parallel tuple construction, sequential copy dispatch. The Dapper result list is projected into (sourceBlobName, destBlobName) tuples using AsParallel().Select(...), but actual ArchivePackage calls are dispatched sequentially. The parallelism only benefits name-string computation, not I/O throughput.
Secondary destination re-queries the database independently. Each destination triggers a full separate Archive() call that re-reads the cursor and re-queries SQL. If packages are published between the two calls, the two archives may cover slightly different sets of packages.
No copy-completion verification. ArchivePackage calls StartCopyAsync(...) and returns immediately. Azure Storage executes the copy asynchronously. A storage outage or throttling event can leave copies in a pending state with no job-level alerting.
Blob name hash encoding. Destination blob names URL-encode the package hash (WebUtility.UrlEncode(hash)). This is necessary because SHA-512 hashes are Base64 strings containing +, /, and = characters that are not safe in Azure blob path segments.
NSSM-based Windows Service deployment. nssm.exe (Non-Sucking Service Manager) is bundled in the NuGet package. PostDeploy.ps1 uses it to register the job with automatic restart-on-failure, ensuring the job recovers from transient crashes without manual intervention.