Overview
NuGet.Services.AzureSearch is the central library that implements everything related to NuGet’s Azure AI Search integration. It defines the document models for two indexes (a search index for user-facing queries and a hijack index for legacy V2 look-ups), the pipelines that populate and update those indexes, and the service layer that executes queries against them. Consuming executable jobs import this library and wire it into their own entry points, but all of the indexing and query logic lives here.
The library supports three distinct data pipelines. Db2AzureSearch performs the initial full bootstrap of both indexes by reading all package registrations from the NuGet Gallery SQL database (or optionally from Kusto), creating or replacing the Azure Search indexes and the blob-based auxiliary data files, and writing the initial cursor position so that the incremental pipeline can pick up from the right point. Catalog2AzureSearch runs continuously as a catalog collector, reading NuGet catalog commits, converting each commit into a set of Azure Search index actions, and pushing those actions in batches; it maintains a durable cursor in blob storage to track progress. Auxiliary2AzureSearch runs as a periodic job that reads owner, download-count, and verified-package data from the Gallery database and the NuGet statistics pipeline, diffs each data set against the previously indexed snapshot stored in blob storage, and pushes only the changed documents to the search index.
A fourth subsystem, the SearchService namespace, provides the runtime query path used by the NuGet search web API. It translates incoming V2, V3, and autocomplete requests into Azure Search queries, applies NuGet-specific text analysis (camel-case splitting, separator tokenization, exact-match boosting), executes queries against the appropriate index, and maps the results back to the API response shapes. Download counts, owner lists, verified status, and popularity-transfer adjustments are kept in memory via AuxiliaryDataCache and are applied at query time by SearchResponseBuilder.
Role in System
BatchPusher which queues index actions for both indexes, pushes them in configurable batches (default 1000 documents), and writes per-package-ID version list blobs to blob storage after all documents for that ID have been flushed.
Dual Azure Search Indexes
A search index holds one document per package ID per
SearchFilters combination (stable, prerelease, SemVer 2) and is used for V3 and non-hijack V2 queries. A hijack index holds one document per package ID+version and is used for legacy V2 look-ups by exact version.Three Indexing Pipelines
Db2AzureSearch bootstraps the indexes from scratch; Catalog2AzureSearch keeps them current via the NuGet catalog; Auxiliary2AzureSearch keeps download counts, owners, and verified status in sync from out-of-band data sources.
Blob-Backed Version Lists
For every package ID, a JSON blob in Azure Storage records the known versions and their properties. The
VersionLists class uses this data to compute which version is the latest under each SearchFilters combination and to determine exactly which search index documents need to change when a version is added, updated, or deleted.Auxiliary Data Cache
At query time the search service holds download counts, verified-package flags, and popularity-transfer data in memory via
AuxiliaryDataCache. This avoids hitting storage on every query and is refreshed periodically by a background reloader.Key Files and Classes
| File | Class / Type | Purpose |
|---|---|---|
AzureSearchJob.cs | AzureSearchJob<T> | Abstract base job that starts feature-flag refreshing, enables SDK-level tracing, and dispatches to a typed IAzureSearchCommand. |
Catalog2AzureSearch/AzureSearchCollectorLogic.cs | AzureSearchCollectorLogic | ICommitCollectorLogic that processes NuGet catalog commit batches: fetches catalog leaves in parallel, builds index actions per package ID, and pushes them via BatchPusher with up to three retries. |
Catalog2AzureSearch/Catalog2AzureSearchCommand.cs | Catalog2AzureSearchCommand | Outer command that initializes indexes/containers if needed, resolves the dependency cursor, and drives the catalog collector loop. |
Catalog2AzureSearch/CatalogIndexActionBuilder.cs | CatalogIndexActionBuilder | Converts catalog leaves for a given package ID into the full set of IndexActions for both indexes. |
Db2AzureSearch/Db2AzureSearchCommand.cs | Db2AzureSearchCommand | Full-bootstrap command: creates indexes and blob container, runs a producer/consumer pipeline to push all package registrations, writes auxiliary data blobs, and writes the initial catalog cursor. |
Db2AzureSearch/NewPackageRegistrationFromDbProducer.cs | NewPackageRegistrationFromDbProducer | Reads all package registrations from the Gallery SQL database and produces work items for the bootstrap pipeline. |
Db2AzureSearch/NewPackageRegistrationFromKustoProducer.cs | NewPackageRegistrationFromKustoProducer | Alternative producer that reads from Kusto instead of SQL; selected at runtime when KustoConnectionString is configured. |
Auxiliary2AzureSearch/Auxiliary2AzureSearchCommand.cs | Auxiliary2AzureSearchCommand | Orchestrates UpdateVerifiedPackagesCommand, UpdateDownloadsCommand, and UpdateOwnersCommand sequentially. |
Auxiliary2AzureSearch/UpdateDownloadsCommand.cs | UpdateDownloadsCommand | Reads download counts from the statistics pipeline (downloads.v1.json) and from blob storage, diffs them, applies popularity transfers, then pushes changes to the search index. |
Auxiliary2AzureSearch/UpdateOwnersCommand.cs | UpdateOwnersCommand | Reads owner data from the Gallery database and from blob storage, diffs them, and pushes changed owner arrays to the search index. |
Auxiliary2AzureSearch/UpdateVerifiedPackagesCommand.cs | UpdateVerifiedPackagesCommand | Diffs the set of verified package IDs and updates the corresponding search documents. |
BatchPusher.cs | BatchPusher | Core flushing engine: queues IndexDocumentsAction objects for both indexes, flushes them in batches, automatically halves batches on HTTP 413 errors, and writes per-package-ID version list blobs concurrently after each batch. |
Models/SearchDocument.cs | SearchDocument (static class) | Defines the document models for the search index: Full, UpdateLatest, UpdateVersionList, UpdateOwners, UpdateDownloadCount, and related interfaces. |
Models/HijackDocument.cs | HijackDocument (static class) | Defines the document models for the hijack index: Full and Latest. |
Models/BaseMetadataDocument.cs | BaseMetadataDocument | Shared base for both index document types; contains all standard package metadata fields (title, description, authors, tags, versions, etc.). |
VersionList/VersionLists.cs | VersionLists | In-memory representation of a package’s version history, segmented by SearchFilters. Computes which version is latest per filter and produces the exact IndexChanges needed when versions are added, changed, or deleted. |
VersionList/VersionListDataClient.cs | VersionListDataClient | Reads and writes per-package-ID version list JSON blobs in Azure Blob Storage; uses ETags for optimistic concurrency. |
SearchService/AzureSearchService.cs | AzureSearchService | Main ISearchService implementation: routes V2 (search or hijack index), V3, and autocomplete requests; resolves each to a document get or a full text search. |
SearchService/SearchTextBuilder.cs | SearchTextBuilder | Translates parsed NuGet query syntax into Azure Search query text, handling field-scoped terms, separator/camelCase tokenization, prefix search for autocomplete, and exact-match boosting. |
SearchService/SearchParametersBuilder.cs | SearchParametersBuilder | Builds the SearchOptions (filters, sorting, facets, scoring profile) to accompany each query. |
SearchService/SearchResponseBuilder.cs | SearchResponseBuilder | Maps Azure Search result objects to V2SearchResponse, V3SearchResponse, and AutocompleteResponse, merging in auxiliary data (download counts, owners, verified status) from AuxiliaryDataCache. |
SearchService/AuxiliaryDataCache.cs | AuxiliaryDataCache | Singleton in-memory cache for download data, verified packages, and popularity transfers; supports conditional reload using ETags to avoid unnecessary blob reads. |
SearchService/SearchStatusService.cs | SearchStatusService | Exposes health-check information: document counts, last commit timestamps, warm-query durations for both indexes, plus server metadata. |
ScoringProfiles/DefaultScoringProfile.cs | DefaultScoringProfile | Builds the Azure Search scoring profile (nuget_scoring_profile) that boosts results by a configurable DownloadScoreBoost magnitude function applied to the log-scaled download score field. |
IndexBuilder.cs | IndexBuilder | Creates, recreates, and deletes the search and hijack Azure Search indexes, including registering the custom analyzers and the default scoring profile. |
AzureSearchTelemetryService.cs | AzureSearchTelemetryService | Centralises all Application Insights telemetry for index pushes, query durations, auxiliary file reloads, and job outcomes. |
DependencyInjectionExtensions.cs | DependencyInjectionExtensions | Extension methods (AddAzureSearch) that register all services, keyed Autofac registrations for search/hijack index clients and blob clients, and choose between SQL and Kusto producers at runtime. |
AzureSearchConfiguration.cs | AzureSearchConfiguration | Root configuration class: Azure Search service name, API key or managed identity client ID, index names, blob storage connection string, container name, and flat-container URL. |
AzureSearchJobConfiguration.cs | AzureSearchJobConfiguration | Extends AzureSearchConfiguration with job-level tunables: AzureSearchBatchSize (default 1000), MaxConcurrentBatches (default 4), MaxConcurrentVersionListWriters (default 8), scoring configuration, and popularity-transfer enable flag. |
Dependencies
NuGet Package References
| Package | Purpose |
|---|---|
Azure.Identity | Provides ManagedIdentityCredential and DefaultAzureCredential for authenticating to Azure Search without API keys. |
Azure.Search.Documents | Official Azure AI Search SDK; used for all index management, document push, and query operations. |
Microsoft.Azure.Kusto.Data | Kusto (Azure Data Explorer) client used by NewPackageRegistrationFromKustoProducer as an alternative data source during bootstrap. |
Microsoft.Rest.ClientRuntime | Provides ServiceClientTracing integration wired up via ServiceClientTracingLogger. |
System.Text.Encodings.Web | Used in JSON serialization of index documents. |
System.Text.Json | Used alongside the Azure SDK serializer for document model serialization. |
Internal Project References
| Project | Purpose |
|---|---|
NuGet.Services.V3 | Provides ICommitCollectorLogic, CommitCollectorUtility, catalog cursor infrastructure, V3 search request/response types (V2SearchRequest, V3SearchResponse, AutocompleteRequest, etc.), and auxiliary-file client contracts (IDownloadsV1JsonClient, IOwnerDataClient, IDownloadDataClient, etc.). |
Notable Patterns and Implementation Details
Two-index architecture. Every package version gets a document in the hijack index (one document per ID+version), while the search index holds one document per package ID per
SearchFilters combination. The hijack index exists exclusively to support legacy V2 queries that ask for a specific version; the search index supports all other query types. AzureSearchService routes requests to the correct index based on the IgnoreFilter flag on the V2 request.Version list blobs as shared state. Because a single package ID can map to up to four search-index documents (one per
SearchFilters), the indexing pipeline needs to know the complete ordered version list to compute which document represents the “latest” version for each filter. This state is kept as a JSON blob per package ID in Azure Blob Storage and read/written by VersionListDataClient using ETag-based optimistic concurrency. VersionLists.ApplyChanges recomputes which index documents need to change for both indexes whenever a version is added, updated, or deleted.Batch splitting on HTTP 413.
BatchPusher.IndexAsync catches RequestFailedException with status 413 (Request Entity Too Large) and recursively splits the batch in half, retrying each half independently. This handles cases where document payloads (e.g. packages with very long version lists or descriptions) make a full batch exceed Azure Search’s request size limit.Download score pre-computation. Rather than sorting by raw download count (which would require a large boosting range in the scoring profile),
DocumentUtilities.GetDownloadScore computes a normalized log-scale score at index time. The DefaultScoringProfile applies a MagnitudeScoringFunction against this DownloadScore field with a configurable DownloadScoreBoost multiplier, keeping the effective boost range manageable while still heavily favouring popular packages.