Skip to main content

Overview

NuGet.Jobs.Auxiliary2AzureSearch is a scheduled background job that keeps the NuGet.org Azure Search “search” index and its associated auxiliary files in sync with the current state of the Gallery database and the statistics pipeline. It runs continuously as a singleton Windows service and performs three incremental update passes on every execution:
  1. Verified packages — updates verified-packages/verified-packages.v1.json in Blob Storage when the set of prefix-reserved package IDs changes.
  2. Downloads — fetches the latest downloads.v1.json from the statistics pipeline URL, diffs it against the previously indexed copy, applies popularity-transfer adjustments, and pushes changed download counts to the Azure Search “search” index in parallel batches.
  3. Owners — fetches the current owner map from Gallery DB, diffs it against the previously indexed copy, pushes changed owner arrays to the “search” index, and appends a timestamped change-history file to Blob Storage for audit purposes.
This job is a singleton. Only one instance must run per Azure Search resource at any given time. Running multiple instances against the same resource will cause conflicting writes and data corruption.

Role in the System

Search Subsystem

Feeds real-time download counts, owner lists, and verified-package flags into the Azure Search “search” index so that NuGet.Services.SearchService can surface accurate metadata without querying the Gallery database on every request.

Auxiliary File Store

Maintains the canonical copies of the four search auxiliary files in Azure Blob Storage (downloads.v2.json, owners.v2.json, verified-packages.v1.json, popularity-transfers.v1.json).

Statistics Bridge

Bridges the statistics pipeline (which produces downloads.v1.json at a public URL) into the Azure Search index, applying popularity-transfer scoring adjustments as configured.

Prerequisite: Db2AzureSearch

Requires NuGet.Jobs.Db2AzureSearch to have performed the initial full index population. Auxiliary2AzureSearch only handles incremental updates from that baseline forward.

Key Files and Classes

File PathClassPurpose
Program.csProgramEntry point; delegates to JobRunner.Run
Job.csJobRegisters DI services; binds Auxiliary2AzureSearchConfiguration and the DownloadsV1JsonClient
Auxiliary2AzureSearchCommand.csAuxiliary2AzureSearchCommandOrchestrator; runs the three sub-commands in sequence and emits an end-to-end telemetry event
UpdateVerifiedPackagesCommand.csUpdateVerifiedPackagesCommandDiffs old vs new verified-package sets via symmetric-except; replaces the blob only when changed
UpdateDownloadsCommand.csUpdateDownloadsCommandCore download-sync logic: fetches, cleans, diffs, applies popularity transfers, batches and pushes to Azure Search
UpdateOwnersCommand.csUpdateOwnersCommandDiffs owner sets, pushes changed owner arrays to Azure Search, persists a change-history blob, replaces owners.v2.json
DataSetComparer.csDataSetComparerGeneric two-pass set comparer used for owner and popularity-transfer diffs
DownloadSetComparer.csDownloadSetComparerCompares old vs new download counts; enforces MaxDownloadCountDecreases guard to prevent corrupt statistics from wiping download history
Auxiliary2AzureSearchConfiguration.csAuxiliary2AzureSearchConfigurationConfiguration POCO — extends AzureSearchJobConfiguration with DownloadsV1JsonUrl, MinPushPeriod, and MaxDownloadCountDecreases
Scripts/Functions.ps1PowerShell helpers for installing/uninstalling the job as a Windows service via NSSM

Dependencies

Internal Project References

ProjectPurpose
NuGet.Jobs.CommonJobRunner, AzureSearchJob<T>, IAzureSearchCommand base infrastructure
NuGet.Services.AzureSearchAll Auxiliary2AzureSearch command implementations, auxiliary-file clients, IBatchPusher, telemetry service

Key NuGet Package Dependencies

PackagePurpose
NuGet.PackagingPackageIdValidator used to clean invalid IDs from download data
NuGet.VersioningNuGetVersion parsing and normalization during download-data cleaning
Microsoft.Extensions.OptionsIOptionsSnapshot<Auxiliary2AzureSearchConfiguration> for runtime config
The project targets net472 and produces a console executable. It is deployed as a Windows service using NSSM (Non-Sucking Service Manager), bundled as Scripts/nssm.exe.

Configuration Reference

{
  "Auxiliary2AzureSearch": {
    "AzureSearchBatchSize": 1000,
    "MaxConcurrentBatches": 1,
    "MaxConcurrentVersionListWriters": 32,
    "SearchServiceName": "<azure-search-resource-name>",
    "SearchServiceApiKey": "<admin-key>",
    "SearchIndexName": "search-000",
    "HijackIndexName": "hijack-000",
    "StorageConnectionString": "<blob-storage-connection-string>",
    "StorageContainer": "v3-azuresearch-000",
    "DownloadsV1JsonUrl": "<statistics-pipeline-url>",
    "MinPushPeriod": "00:00:10",
    "MaxDownloadCountDecreases": 30000,
    "EnablePopularityTransfers": true
  }
}

Notable Patterns and Implementation Details

MaxDownloadCountDecreases (default: 15,000; recommended production value: 30,000) is a safety guard in DownloadSetComparer. If the number of packages whose download count would decrease exceeds this threshold, the job aborts the download update to prevent a corrupted statistics file from zeroing out counts across the catalog. Setting it too low causes false-positive aborts during legitimate statistics reprocessing.
Popularity transfers are double-gated. UpdateDownloadsCommand checks both the EnablePopularityTransfers configuration property and the IFeatureFlagService runtime flag. If either is disabled, all transfers are treated as empty, effectively removing any previously applied transfer boosts on the next run.
UpdateDownloadsCommand uses a two-level batching strategy: it groups package IDs for parallel version list fetching (controlled by MaxConcurrentVersionListWriters), then groups resulting index actions into batches no larger than AzureSearchBatchSize. A MinPushPeriod delay between pushes throttles write pressure on the search service.
DataSetComparer.CompareOwners uses ordinal string comparison for owner usernames (not ordinal-ignore-case) so that a username case change is correctly detected as a change and propagated to the index. Popularity-transfer comparisons use ordinal-ignore-case because casing changes in package IDs are considered insignificant there.
Owner change-history files are written to owners/changes/TIMESTAMP.json with a path timestamp formatted as yyyy-MM-dd-HH-mm-ss-FFFFFFF (UTC). These files are append-only and exist solely for future forensic and auditing purposes. They intentionally omit owner usernames to comply with GDPR requirements.
The ExcludedPackages.v1.json auxiliary file is not updated by this job. It is managed manually and only consumed during a full index rebuild by Db2AzureSearch. Any change to the excluded-packages list requires triggering a full rebuild.