Skip to main content

Overview

PackageLagMonitor is a .NET console job (Monitoring.PackageLag) that measures search index lag — the elapsed time between when a package event (create, update, or delete) is recorded in the NuGet V3 catalog and when the change becomes visible in the NuGet search service. It runs as a periodic job, processes catalog leaves, polls search endpoints until the expected state appears, then records the measured delay as Application Insights metrics. Two distinct lag values are tracked per package event:
  • Package Creation Lag (PackageCreationLagInSeconds) — time from a package’s Created timestamp to when it appears in search (skipped for list/unlist operations).
  • V3 Lag (V3LagInSeconds) — time from the LastEdited (or Created if never edited) timestamp to when the change is reflected in search.

Role in System

Catalog Consumer

Reads NuGet V3 catalog leaves via NuGet.Protocol.Catalog to discover new and changed packages in near-real-time.

Search Prober

Actively queries configured AzureSearch endpoints with ignorefilter=true&semverlevel=2.0.0 to detect when a package version becomes visible.

Telemetry Emitter

Pushes PackageCreationLagInSeconds and V3LagInSeconds metrics to Application Insights, tagged by region, instance index, package id, and version.

Ops Health Signal

Provides the data behind SLA dashboards and alerts that track search pipeline health across deployment regions.

Key Files and Classes

File PathClass / TypePurpose
Program.csProgramEntry point; delegates immediately to JobRunner.Run.
Job.csJobExtends JsonConfigurationJob; wires DI, bootstraps the catalog processor loop, and drives the main run cycle.
PackageLagCatalogLeafProcessor.csPackageLagCatalogLeafProcessorICatalogLeafProcessor implementation; fans out lag-computation tasks per leaf across all search instances.
SearchServiceClient.csSearchServiceClientHTTP client for the search /query and /diag endpoints; normalises AzureSearch diagnostic responses into a common shape.
ISearchServiceClient.csISearchServiceClientInterface contract for the search client (enables testing).
PackageLagMonitorConfiguration.csPackageLagMonitorConfigurationConfig POCO bound from MonitorConfiguration section: service index URL, retry settings, and per-region info.
RegionInformation.csRegionInformationPer-region config: resource group, service name, base URL, and ServiceType.
Instance.csInstanceImmutable runtime descriptor for a single search endpoint (slot, index, diag URL, query URL, region, service type).
ServiceType.csServiceTypeEnum; currently only AzureSearch is defined (legacy LuceneSearch was removed).
SearchResultResponse.csSearchResultResponse / SearchResultDTOs for deserialising search query responses.
SearchDiagnosticResponse.csSearchDiagnosticResponse / CommitUserDataNormalised diagnostic DTO holding last index reload time and commit timestamp.
AzureSearchDiagnosticResponse.csAzureSearchDiagnosticResponse / IndexInformationRaw DTO for the AzureSearch /diag JSON shape.
Telemetry/IPackageLagTelemetryService.csIPackageLagTelemetryServiceInterface for the two telemetry track methods.
Telemetry/PackageLagTelemetryService.csPackageLagTelemetryServiceCalls ITelemetryClient.TrackMetric with structured property bags for both lag metric names.
HttpWrappers/IHttpClientWrapper.csIHttpClientWrapperThin testable wrapper around HttpClient.GetAsync.
HttpResponseException.csHttpResponseExceptionCustom exception that carries HTTP status code and reason phrase for failed search/diag requests.

Dependencies

Internal Project References

ProjectRole
NuGet.Jobs.CommonProvides JsonConfigurationJob, JobRunner, FileCursor, logging infrastructure, and ITelemetryClient / TelemetryClientWrapper.
NuGet.Protocol.CatalogProvides ICatalogClient, CatalogProcessor, ICatalogLeafProcessor, PackageDetailsCatalogLeaf, PackageDeleteCatalogLeaf, and CatalogProcessorSettings.

NuGet Packages (resolved transitively via project refs)

PackageUsage
AutofacDI container used by the JsonConfigurationJob base class.
Microsoft.ApplicationInsightsTelemetryClient and Application Insights pipeline.
Microsoft.Extensions.ConfigurationConfiguration binding from JSON.
Microsoft.Extensions.DependencyInjectionIServiceCollection service registration.
Microsoft.Extensions.LoggingStructured logging throughout.
Microsoft.Extensions.OptionsIOptionsSnapshot<T> binding for the configuration POCO.
Newtonsoft.JsonJSON deserialisation of search and diagnostic responses.

Deployment Artifacts (nuspec)

ArtifactDescription
Scripts/Functions.ps1Shared PowerShell helpers used by pre/post deploy scripts.
Scripts/PreDeploy.ps1Pre-deployment step (service stop/uninstall via NSSM).
Scripts/PostDeploy.ps1Post-deployment step (service install/start via NSSM).
Scripts/nssm.exeNon-Sucking Service Manager binary; registers the job as a Windows service.

Notable Patterns and Implementation Details

Catalog cursor bootstrap: Rather than reading a persisted cursor from a prior run, Job.Run first queries every configured search instance for its current commit timestamp, takes the maximum, and sets the FileCursor to maxCommit + 1 tick. This means each job invocation only processes catalog leaves that post-date the most advanced search instance, avoiding duplicate lag measurements across restarts.
Fan-out lag computation: PackageLagCatalogLeafProcessor immediately returns true from ProcessPackageDetailsAsync / ProcessPackageDeleteAsync without awaiting; the actual work is queued into _packageProcessTasks. WaitForProcessing() drains all tasks with Task.WhenAll before the job run completes. This allows the catalog processor to batch-read leaves without blocking on per-package polling loops.
AzureSearch reload time approximation: The AzureSearch /diag endpoint does not expose a true “last index reload” timestamp. SearchServiceClient.ConvertAzureSearchResponse substitutes DateTimeOffset.UtcNow for LastIndexReloadTime. As a result, reported lag values include the wall-clock time between the package becoming visible and the moment the job first observes it — slightly overstating true lag. See the linked engineering issue (NuGet/Engineering#2651) in the source comment.
Retry semantics differ for delete vs. create: During the polling loop in ComputeLagForQueries, a delete is considered “pending” as long as TotalHits > 0 (package still visible), while a create/edit is “pending” when TotalHits == 0 OR when Data[0].LastEdited < lastEdited. The retry logic is asymmetric and is easy to misread when modifying the polling loop.
List vs. create distinction: If a package is already present in search at the first poll attempt (retryCount == 0 and resultCount > 0), the job infers this is a list/unlist operation rather than a first-time publish and sets isListOperation = true. Creation lag is then suppressed for that leaf and only V3 lag is emitted, preventing inflated creation lag metrics for edits.
ServiceType enum is forward-looking but currently single-valued: The enum and all switch statements are structured to support multiple service types, but only AzureSearch is implemented. The old LuceneSearch path has been removed. Any RegionInformation with an unknown ServiceType will throw NotImplementedException from GetSearchEndpoints.
Target framework: The project targets net472 (full .NET Framework 4.7.2), not .NET Core or .NET 5+. This is consistent with other jobs in the NuGetGallery monorepo that are deployed as Windows services via NSSM.