Skip to main content

Validation.ContentScan.Core

Overview

Validation.ContentScan.Core is a small, focused shared library that provides the messaging contracts and Service Bus enqueuer for the NuGet content scanning validation pipeline. It does not perform any scanning itself — instead, it defines the message types, serialization logic, and the enqueuer abstraction that other services use to trigger and poll content scans. The library lives in the NuGet.Jobs.Validation.ContentScan namespace and targets net472. It is versioned alongside other NuGet Jobs packages via the $(JobsPackageVersion) MSBuild property.

Role in the System

Within the NuGet Gallery validation ecosystem, package validation is orchestrated by NuGet.Services.Validation.Orchestrator. When the orchestrator needs to verify that a package’s content is safe, it delegates to ContentScanValidator, which uses IContentScanEnqueuer (defined here) to send work to an external content-scanning service over Azure Service Bus.
Orchestrator (ContentScanValidator)

        │  EnqueueContentScanAsync(validationStepId, contentUri)

Validation.ContentScan.Core (ContentScanEnqueuer)

        │  Serialized brokered message → Azure Service Bus Topic

External Content-Scanning Worker (reads & processes)

        │  Worker updates validation state in DB

Orchestrator polls via EnqueueContentScanStatusCheckAsync(validationStepId)
This library is consumed exclusively by NuGet.Services.Validation.Orchestrator. There is no worker-side code in this library — the consumer of the Service Bus messages is a separate service outside this repository.

Key Files and Classes

FileClass / TypePurpose
IContentScanEnqueuer.csIContentScanEnqueuerPrimary interface: enqueue a scan start or a status-check message, with optional delivery delay override
ContentScanEnqueuer.csContentScanEnqueuerConcrete implementation — serializes a ContentScanData message, applies scheduled delivery time, and sends via ITopicClient
ContentScanData.csContentScanDataDiscriminated-union-style message envelope; constructed only via static factory methods NewStartContentScanData / NewCheckContentScanStatus
ContentScanOperationType.csContentScanOperationTypeEnum with two values: StartScan and CheckStatus
StartContentScanData.csStartContentScanDataPayload for scan-start messages; carries ValidationStepId, BlobUri, and optional ContentType
ContentScanStatusMessage.csCheckContentScanStatusDataPayload for status-poll messages; carries only ValidationStepId
ContentScanMessageSerializer.csContentScanMessageSerializerIBrokeredMessageSerializer<ContentScanData> implementation; routes on schema name StartContentScanData (v1) or CheckContentScanStatusData (v1)
ContentScanEnqueuerConfiguration.csContentScanEnqueuerConfigurationSingle-property config POCO: MessageDelay (TimeSpan?) — the default visibility delay for outbound Service Bus messages

Dependencies

Internal Project References

ProjectRole
NuGet.Services.ServiceBusSupplies ITopicClient, IBrokeredMessageSerializer<T>, BrokeredMessageSerializer<T>, IReceivedBrokeredMessage, and the [Schema] attribute used by the serializer

Implicit Framework / Transitive Dependencies

PackagePurpose
Microsoft.Extensions.Logging.AbstractionsILogger<T> for structured logging in ContentScanEnqueuer
Microsoft.Extensions.OptionsIOptionsSnapshot<ContentScanEnqueuerConfiguration> for live configuration reload
There are no direct NuGet package <PackageReference> entries in the .csproj — all external dependencies (logging, options, Service Bus abstractions) are pulled in transitively through NuGet.Services.ServiceBus.

Notable Patterns and Implementation Details

Discriminated Union Message

ContentScanData acts as a tagged union: the Type property (ContentScanOperationType) determines which payload property is non-null. The constructor enforces exactly one non-null payload and throws ArgumentException if both or neither are provided.

Versioned Schema Serialization

ContentScanMessageSerializer uses private inner classes annotated with [Schema(Name = ..., Version = 1)]. Deserialization dispatches on the schema name read from the brokered message header, enabling future schema versions without breaking existing messages in flight.

Scheduled Delivery

Every outbound message is sent with a ScheduledEnqueueTimeUtc. The delay resolves as: per-call messageDeliveryDelayOverride → configured MessageDelayTimeSpan.Zero. Negative delay overrides are rejected with ArgumentOutOfRangeException.

URL Redaction in Logs

Before logging the blob URL, ContentScanEnqueuer rebuilds the URI with the query string replaced by "REDACTED" to prevent SAS token leakage into log streams.
ContentScanData’s constructor is internal, not public. External callers must use the static factory methods ContentScanData.NewStartContentScanData(...) and ContentScanData.NewCheckContentScanStatus(...). Direct construction from a referencing project will not compile.
StartContentScanData has two constructors — one omitting ContentType and one including it. However, the StartContentScanData1 wire schema (version 1) does declare a ContentType field. The deserializer currently uses the two-argument constructor, so any ContentType value received from Service Bus is silently discarded on inbound messages.

Message Flow Summary

// Trigger a new content scan
await enqueuer.EnqueueContentScanAsync(validationStepId, new Uri("https://...blob..."));

// Poll for completion (typically re-queued by the orchestrator on a timer loop)
await enqueuer.EnqueueContentScanStatusCheckAsync(validationStepId);

// Both calls resolve to:
//   ContentScanMessageSerializer.Serialize(ContentScanData)
//   → IBrokeredMessage with ScheduledEnqueueTimeUtc applied
//   → ITopicClient.SendAsync(message)