Introduction

Cupel Specification Version 1.0.0

This document is the language-agnostic specification for the Cupel context selection algorithm. Cupel selects, scores, and orders context items for inclusion in a language model’s context window, subject to a token budget.

Purpose

Large language model applications must decide which pieces of context (messages, documents, tool outputs, memories) to include in a finite context window. Cupel defines a deterministic pipeline that takes a set of candidate context items and a token budget, and produces an ordered list of selected items that fits within the budget.

This specification enables interoperable implementations across programming languages. An implementation that conforms to this specification will produce the same selected items in the same order as any other conforming implementation, given identical inputs.

Scope

This specification defines:

  • The data model: ContextItem, ContextBudget, ScoredItem, and supporting enumerations
  • The pipeline: a fixed 6-stage transformation (Classify, Score, Deduplicate, Sort, Slice, Place)
  • All scorer algorithms: RecencyScorer, PriorityScorer, KindScorer, TagScorer, FrequencyScorer, ReflexiveScorer, CompositeScorer, ScaledScorer
  • All slicer strategies: GreedySlice, KnapsackSlice, QuotaSlice
  • All placer strategies: ChronologicalPlacer, UShapedPlacer
  • Overflow handling: Throw, Truncate, Proceed strategies

This specification does not define:

  • Streaming or asynchronous pipeline execution (implementation-specific)
  • Diagnostics, tracing, or observability infrastructure
  • Dependency injection or builder APIs
  • Serialization formats (JSON, etc.)
  • Named policies or preset configurations
  • Tokenizer implementations (the caller provides token counts)

Conformance Model

A conforming implementation must exhibit behavioral equivalence: given the same input items, budget, scorer, slicer, and placer configuration, a conforming implementation must select the same items in the same order as the reference behavior defined in this specification.

Behavioral equivalence does not require bit-exact floating-point scores. Intermediate score values may differ due to floating-point evaluation order, provided the final selection and ordering are identical. Conformance tests compare selected item sets and their ordering, not exact score values. Where individual scores are verified (e.g., per-stage scorer tests), an epsilon tolerance of 1e-9 is used.

See Conformance Levels for the full conformance requirements.

Numeric Precision

All scoring computations MUST use IEEE 754 64-bit double-precision floating-point arithmetic. This includes:

  • Individual scorer outputs
  • Composite score aggregation (weighted averages)
  • Score-based comparisons in sorting, deduplication, and slicing
  • Density calculations in slicer algorithms

Implementations MUST NOT use 32-bit floats, fixed-point, or arbitrary-precision arithmetic for scoring. Integer arithmetic is permitted for token counts and budget calculations.

Notation Conventions

Algorithms in this specification use CLRS-style pseudocode with the following conventions:

ConventionMeaning
bold lowercaseKeywords: for, if, else, return, while
monospaceVariables, field names, function names
<-Assignment (not =, to avoid confusion with equality)
=Equality comparison
[i]0-based array/list indexing
//Comments
length(x)Number of elements in array or list x

Pipeline Overview

The Cupel pipeline is a fixed sequence of six stages. Every execution follows this order; stages cannot be reordered or skipped.

flowchart LR
    A[Candidate Items] --> B[Classify]
    B --> C[Score]
    C --> D[Deduplicate]
    D --> E[Sort]
    E --> F[Slice]
    F --> G[Place]
    G --> H[Ordered Context Window]
StageInputOutputPurpose
ClassifyCandidate itemsPinned list + Scoreable listPartition items; exclude invalid
ScoreScoreable itemsScored itemsCompute relevance scores
DeduplicateScored itemsUnique scored itemsRemove duplicate content
SortUnique scored itemsSorted scored itemsOrder by score descending
SliceSorted scored itemsBudget-fitting itemsSelect items within token budget
PlaceSliced + pinned itemsOrdered itemsDetermine final presentation order

The pipeline operates on the principle of ordinal-only scoring: scorers assign relevance scores (rank), slicers select items within budget (drop), and placers determine presentation order (position). Each concern is strictly separated.

Data Model

The Cupel data model consists of three primary types and three supporting enumerations. These types flow through every stage of the pipeline.

Primary Types

  • ContextItem — An immutable record representing a single piece of context (a message, document, tool output, etc.). Every scorer, slicer, and placer operates on ContextItem instances.

  • ContextBudget — The budget constraint model that controls how many tokens the pipeline can select. Validated at construction time so that no invalid budget can exist at runtime.

  • ScoredItem — A pair of (ContextItem, Score) produced by the scoring stage and consumed by subsequent pipeline stages.

Enumerations

  • ContextKind — An extensible string enumeration classifying the type of a context item (Message, Document, ToolOutput, Memory, SystemPrompt).

  • ContextSource — An extensible string enumeration identifying the origin of a context item (Chat, Tool, Rag).

  • OverflowStrategy — A closed enumeration controlling behavior when selected items exceed the token budget (Throw, Truncate, Proceed).

Immutability

All data model types are immutable. Once constructed, a ContextItem or ContextBudget cannot be modified. Pipeline stages produce new collections rather than mutating inputs. This invariant simplifies reasoning about pipeline behavior and enables safe concurrent access.

ContextItem

A ContextItem is an immutable record representing a single piece of context in the pipeline. Every scorer, slicer, and placer operates on ContextItem instances.

Fields

FieldTypeRequiredDefaultDescription
contentstringYesThe textual content of the item. Non-null and non-empty.
tokensintegerYesThe token count for this context item, as provided by the caller. Items with negative token counts are excluded during classification.
kindContextKindNo"Message"The kind of context item. Used by KindScorer and QuotaSlice.
sourceContextSourceNo"Chat"The origin of this context item.
priorityinteger or nullNonullOptional priority override. Higher values indicate greater importance. Used by PriorityScorer.
tagslist of stringsNoempty listDescriptive tags for filtering and scoring. May be empty. Tag comparison is case-insensitive (ASCII fold). Used by TagScorer and FrequencyScorer.
metadatamap of string to anyNoempty mapArbitrary key-value metadata. Opaque to the pipeline — not read or modified by any pipeline stage. Preserved on output for caller use.
timestampdatetime (UTC) or nullNonullWhen this context item was created or observed. Used by RecencyScorer and ChronologicalPlacer.
futureRelevanceHintfloat64 or nullNonullHint for future relevance scoring, conventionally in the range [0.0, 1.0]. Used by ReflexiveScorer.
pinnedbooleanNofalseWhether this item is pinned. Pinned items bypass scoring and slicing — they are always included in the output regardless of score or budget.
originalTokensinteger or nullNonullThe original token count before any external summarization or truncation. Not used by the pipeline; preserved for caller diagnostics.

Constraints

  1. content MUST be a non-null, non-empty string. Implementations SHOULD reject construction of a ContextItem with null or empty content.

  2. tokens is caller-provided. The pipeline does not tokenize content; it trusts the caller’s token count. The pipeline excludes items with tokens < 0 during classification (see Stage 1: Classify).

  3. tags uses case-insensitive comparison. When comparing tags (e.g., in TagScorer or FrequencyScorer), implementations MUST use case-insensitive ASCII comparison. "Important" and "important" are considered equal.

  4. metadata is opaque. The pipeline MUST NOT read, interpret, or modify metadata values. Implementations MUST preserve metadata through the pipeline so that selected items retain their original metadata on output.

  5. timestamp represents a point in time with UTC semantics. Implementations may use any datetime representation that preserves UTC instant precision (e.g., ISO 8601 with timezone, Unix epoch milliseconds). Comparison is by temporal ordering.

  6. Immutability. Once constructed, a ContextItem MUST NOT be modified. Pipeline stages MUST NOT mutate any field of any ContextItem they receive.

ContextBudget

A ContextBudget defines the token budget constraints that control how much context the pipeline can select. All fields are validated at construction time — no invalid budget can exist at runtime.

Fields

FieldTypeRequiredDefaultConstraints
maxTokensintegerYes>= 0. Hard ceiling: the model’s context window size.
targetTokensintegerYes>= 0, <= maxTokens. Soft goal: the slicer aims for this token count.
outputReserveintegerNo0>= 0, <= maxTokens. Tokens reserved for model output generation, subtracted from available budget.
reservedSlotsmap of ContextKind to integerNoempty mapMinimum guaranteed items per kind. Each value >= 0. Used by QuotaSlice.
estimationSafetyMarginPercentfloat64No0.0>= 0.0, <= 100.0. Percentage buffer for token estimation error.

Validation Rules

A conforming implementation MUST enforce these validation rules at construction time and reject invalid budgets:

  1. maxTokens >= 0 — Negative maximum tokens are invalid.
  2. targetTokens >= 0 — Negative target tokens are invalid.
  3. targetTokens <= maxTokens — The soft target cannot exceed the hard ceiling.
  4. outputReserve >= 0 — Negative output reserve is invalid.
  5. outputReserve <= maxTokens — The output reserve cannot exceed the context window.
  6. estimationSafetyMarginPercent >= 0.0 AND <= 100.0 — Must be a valid percentage.
  7. Each value in reservedSlots >= 0 — Negative slot reservations are invalid.

Effective Budget

The pipeline computes an effective budget for the slicing stage by subtracting tokens already committed and applying any configured safety margin:

reservedTokens  = sum of all values in reservedSlots
effectiveMax    = max(0, maxTokens - outputReserve - pinnedTokens - reservedTokens)
effectiveTarget = max(0, targetTokens - pinnedTokens - reservedTokens)
effectiveTarget = min(effectiveTarget, effectiveMax)

if estimationSafetyMarginPercent > 0:
    multiplier      = 1.0 - estimationSafetyMarginPercent / 100.0
    effectiveMax    = floor(effectiveMax * multiplier)
    effectiveTarget = floor(effectiveTarget * multiplier)
    effectiveTarget = min(effectiveTarget, effectiveMax)

Where:

  • pinnedTokens is the sum of tokens for all pinned items.
  • reservedTokens is the sum of all values in reservedSlots, subtracted alongside outputReserve and pinnedTokens to reserve capacity for per-kind guarantees.
  • The safety margin is applied after all subtractions as a multiplicative reduction. Both effectiveMax and effectiveTarget use floor (integer truncation toward zero) when converting from the floating-point product.

See Stage 5: Slice for the full pseudocode.

Semantics

  • maxTokens is the absolute ceiling. The total tokens of all selected items (pinned + sliced) MUST NOT exceed maxTokens - outputReserve, except when pinned items alone exceed this value (which is an error reported during classification).

  • targetTokens is the soft goal. The slicer aims to select items whose total tokens are at most targetTokens. The pipeline checks for overflow against targetTokens after merging pinned and sliced items.

  • outputReserve carves out tokens for the model’s response. It reduces the effective budget available for context items.

  • reservedSlots guarantees minimum representation per ContextKind. The sum of reserved slot values is subtracted from the effective budget during pipeline budget computation, reducing the token ceiling available for non-reserved items. Additionally, QuotaSlice uses reserved slots to guarantee minimum representation per kind.

  • estimationSafetyMarginPercent provides a buffer for callers whose token counts are estimates rather than exact. It is applied as a multiplicative reduction to the effective budget after reserved slots and other subtractions. A value of 10.0 reduces the effective budget to 90% of its post-subtraction value.

Enumerations

ContextKind

ContextKind is an extensible string enumeration that classifies the type of a context item. Implementations MUST support arbitrary string values beyond the well-known set.

Well-Known Values

ValueDescription
"Message"A conversational message (default)
"Document"A document or file content
"ToolOutput"Output from a tool invocation
"Memory"A stored memory or fact
"SystemPrompt"A system-level instruction

Comparison Semantics

ContextKind comparison MUST be case-insensitive using ASCII case folding. The following are all considered equal:

  • "Message", "message", "MESSAGE", "mEsSaGe"

This case-insensitivity applies everywhere ContextKind is compared:

Construction

A ContextKind value MUST be a non-null, non-whitespace-only string. Implementations SHOULD reject empty or whitespace-only values at construction time.


ContextSource

ContextSource is an extensible string enumeration that identifies the origin of a context item. Implementations MUST support arbitrary string values beyond the well-known set.

Well-Known Values

ValueDescription
"Chat"From a user chat interaction (default)
"Tool"From a tool or function call
"Rag"From a retrieval-augmented generation source

Comparison Semantics

ContextSource comparison MUST be case-insensitive using ASCII case folding. The following are all considered equal:

  • "Chat", "chat", "CHAT"

Construction

A ContextSource value MUST be a non-null, non-whitespace-only string.


OverflowStrategy

OverflowStrategy is a closed enumeration (not extensible) that controls pipeline behavior when selected items exceed the token budget after merging pinned and sliced items.

Values

ValueBehavior
ThrowRaise an error (exception) when selected items exceed targetTokens. This is the default strategy.
TruncateRemove lowest-priority items from the selection until the total fits within targetTokens. Pinned items are never removed by truncation. Items are removed from the tail of the merged list (lowest-scored non-pinned items first).
ProceedAccept the over-budget selection and report the overflow to an observer. No items are removed.

Overflow Detection

Overflow is detected after the Place stage merges pinned items (assigned score 1.0) with sliced items. The total tokens of the merged set are compared against targetTokens:

mergedTokens = sum of tokens for all items in merged set
if mergedTokens > targetTokens:
    apply OverflowStrategy

See Stage 6: Place for the full overflow handling algorithm.


ScoredItem

ScoredItem is a value type (pair/tuple) that associates a ContextItem with its computed relevance score.

Fields

FieldTypeDescription
itemContextItemThe context item
scorefloat64The computed relevance score, conventionally in the range [0.0, 1.0]

Semantics

  • The score field is an IEEE 754 64-bit double.
  • Scores are conventionally in [0.0, 1.0] but this is not enforced by the type. Individual scorers may produce values outside this range (e.g., KindScorer with custom weights).
  • ScoredItem is produced by the Score stage and consumed by Deduplicate, Sort, Slice, and Place.
  • Pinned items are assigned a score of 1.0 when merged into the scored item list during the Place stage.

Pipeline

The Cupel pipeline is a fixed 6-stage transformation that takes a set of candidate context items and a token budget, and produces an ordered list of selected items that fits within the budget.

Invariants

  1. Fixed stage order. The pipeline always executes stages in this order: Classify, Score, Deduplicate, Sort, Slice, Place. Stages cannot be reordered, skipped, or inserted between.

  2. Ordinal-only scoring. Scorers assign relevance scores (rank). Slicers select items within budget (drop). Placers determine presentation order (position). Each concern is strictly separated — a scorer never drops items, a slicer never reorders, a placer never scores.

  3. Immutable items. No stage modifies the ContextItem instances it receives. Stages produce new collections; they never mutate inputs.

Data Flow

flowchart TD
    Input["Candidate Items\n(list of ContextItem)"] --> Classify
    Classify --> |"pinned items\n(list of ContextItem)"| Place
    Classify --> |"scoreable items\n(list of ContextItem)"| Score
    Score --> |"scored items\n(list of ScoredItem)"| Deduplicate
    Deduplicate --> |"unique scored items\n(list of ScoredItem)"| Sort
    Sort --> |"sorted scored items\n(list of ScoredItem)"| Slice
    Slice --> |"budget-fitting items\n(list of ContextItem)"| Place
    Place --> Output["Ordered Context Window\n(list of ContextItem)"]

Stage Summary

#StageInputOutputKey Behavior
1ClassifyCandidate itemsPinned + Scoreable listsPartition; exclude negative-token items
2ScoreScoreable itemsScored itemsInvoke scorer; produce (item, score) pairs
3DeduplicateScored itemsUnique scored itemsRemove duplicate content (byte-exact)
4SortUnique scored itemsSorted scored itemsStable sort by score descending
5SliceSorted scored itemsBudget-fitting itemsSelect items within effective token budget
6PlaceSliced + Pinned itemsOrdered itemsMerge, handle overflow, determine final order

The pipeline operates on the principle of ordinal-only scoring: scorers assign relevance scores (rank), slicers select items within budget (drop), and placers determine presentation order (position). Each concern is strictly separated.

Error Conditions

The pipeline may raise errors in two situations:

  1. Pinned items exceed available budget. If the total tokens of pinned items exceeds maxTokens - outputReserve, the pipeline MUST raise an error during the Classify stage. This is not recoverable — the caller must reduce pinned items or increase the budget.

  2. Overflow after merge. If the total tokens of merged items (pinned + sliced) exceeds targetTokens, the pipeline applies the configured OverflowStrategy. With the Throw strategy (default), this raises an error.

Stage 1: Classify

The Classify stage partitions input items into two groups — pinned items that bypass scoring and slicing, and scoreable items that enter the scoring pipeline — while excluding invalid items.

Overview

Classification is the entry point of the pipeline. It performs three tasks:

  1. Exclude invalid items. Items with tokens < 0 are excluded with reason NegativeTokens. These items do not appear in any subsequent stage or in the output.
  2. Partition by pinned status. Items with pinned = true are placed in the pinned list; all others are placed in the scoreable list.
  3. Validate pinned budget. The total tokens of all pinned items must fit within maxTokens - outputReserve. If they do not, the pipeline raises an error.

Pinned items flow directly to the Place stage, skipping Score, Deduplicate, Sort, and Slice entirely. Scoreable items continue to the Score stage.

Algorithm

CLASSIFY(items, budget):
    pinned    <- empty list
    scoreable <- empty list

    for i <- 0 to length(items) - 1:
        item <- items[i]
        if item.tokens < 0:
            // Exclude: NegativeTokens
            continue
        if item.pinned:
            APPEND(pinned, item)
        else:
            APPEND(scoreable, item)

    // Validate pinned budget
    pinnedTokens <- 0
    for i <- 0 to length(pinned) - 1:
        pinnedTokens <- pinnedTokens + pinned[i].tokens

    availableForPinned <- budget.maxTokens - budget.outputReserve
    if pinnedTokens > availableForPinned:
        ERROR("Pinned items require {pinnedTokens} tokens, but only
               {availableForPinned} are available")

    return (pinned, scoreable)

Edge Cases

  • Empty input. If items is empty, both pinned and scoreable are empty lists. This is not an error.
  • All items pinned. If every item is pinned, scoreable is empty. The Score, Deduplicate, Sort, and Slice stages operate on an empty list (producing empty output). The Place stage receives only pinned items.
  • All items excluded. If every item has tokens < 0, both lists are empty.
  • Zero-token items. Items with tokens = 0 are valid and are classified normally (into pinned or scoreable). They are not excluded.
  • Pinned item with tokens < 0. The negative-token check runs before the pinned check. A pinned item with tokens < 0 is excluded, not pinned.

Conformance Notes

  • The exclusion check (tokens < 0) MUST be evaluated before the pinned partition. A pinned item with negative tokens is excluded.
  • The pinned budget validation MUST compare against maxTokens - outputReserve, not targetTokens.
  • Input order within each partition MUST be preserved. If items A, B, C are scoreable (in that input order), they appear as A, B, C in the scoreable list.

Stage 2: Score

The Score stage computes a relevance score for each scoreable item by invoking the configured scorer, producing a list of ScoredItem pairs.

Overview

Scoring is the primary ranking mechanism in the pipeline. For each scoreable item, the scorer receives both the individual item and the full list of all scoreable items. This dual-argument design enables both absolute scorers (that examine only the item) and relative scorers (that compare the item against its peers).

The output is a list of ScoredItem pairs, each containing the original ContextItem and its computed score. The order of the output list matches the order of the input list (input position is preserved).

Algorithm

SCORE(scoreable, scorer):
    scored <- new array of length(scoreable)

    for i <- 0 to length(scoreable) - 1:
        score <- scorer.Score(scoreable[i], scoreable)
        scored[i] <- ScoredItem(scoreable[i], score)

    return scored

Scorer Interface

A scorer implements a single function:

Score(item: ContextItem, allItems: list of ContextItem) -> float64
  • item is the individual item being scored.
  • allItems is the complete list of scoreable items (including item itself).
  • The return value is an IEEE 754 64-bit double.

See the Scorers chapter for all scorer algorithm specifications.

Score Semantics

  • Scores are conventionally in the range [0.0, 1.0], where 0.0 indicates lowest relevance and 1.0 indicates highest relevance.
  • The [0.0, 1.0] range is a convention, not an enforced constraint. Individual scorers (e.g., KindScorer with custom weights) may produce values outside this range.
  • All scoring computations MUST use IEEE 754 64-bit double-precision floating-point arithmetic (see Introduction).

Edge Cases

  • Empty scoreable list. If scoreable is empty, the output is an empty array. The scorer is never invoked.
  • Single item. The scorer receives a one-element allItems list. Relative scorers (e.g., RecencyScorer, PriorityScorer) handle this by returning a defined value (typically 1.0 for the sole item with a non-null field).
  • NaN or infinite scores. This specification does not mandate rejection of NaN or infinite scores at the Score stage. However, such values may produce implementation-defined behavior in subsequent stages (Sort, Slice). Well-behaved scorers produce finite values.

Conformance Notes

  • The scorer MUST receive the full scoreable list as the second argument, not a subset or the original input list.
  • Output order MUST match input order: scored[i] corresponds to scoreable[i].
  • The scorer is invoked exactly once per item per pipeline execution.

Stage 3: Deduplicate

The Deduplicate stage removes items with duplicate content, keeping only the highest-scoring instance of each unique content string. Deduplication is enabled by default but can be disabled.

Overview

Deduplication prevents the same content from appearing multiple times in the context window. This is important because retrieval-augmented generation (RAG) sources and tool outputs may produce identical content from different sources or at different times.

Identity semantics: Two items are considered duplicates if and only if their content fields are byte-exact equal. This specification mandates ordinal (byte-exact) comparison with no Unicode normalization, no case folding, and no whitespace normalization. Items that differ by a single byte — including trailing whitespace, BOM markers, or normalization form — are considered distinct.

Pitfall (P3): Deduplication Identity. Implementations MUST use byte-exact ordinal comparison for content equality. Using Unicode-normalized or case-insensitive comparison would produce different deduplication results and fail conformance tests.

Algorithm

DEDUPLICATE(scored, enabled):
    if not enabled or length(scored) = 0:
        return scored

    // Map: content string -> index of best item in scored array
    bestByContent <- empty map (ordinal string keys)

    for i <- 0 to length(scored) - 1:
        content <- scored[i].item.content
        if content IN bestByContent:
            existingIndex <- bestByContent[content]
            if scored[i].score > scored[existingIndex].score:
                bestByContent[content] <- i
            // Tiebreak: when scores are equal, keep the lower index
            // (the existing entry already has the lower index, so no action needed)
        else:
            bestByContent[content] <- i

    // Collect survivors in original order
    deduped <- empty list
    for i <- 0 to length(scored) - 1:
        if bestByContent[scored[i].item.content] = i:
            APPEND(deduped, scored[i])

    return deduped

Tiebreaking

When two duplicate items have the same score, the item with the lower original index (earlier in the input) is kept. This is achieved naturally by the algorithm: the first occurrence is stored in the map, and subsequent items with equal scores do not replace it.

Edge Cases

  • Deduplication disabled. When disabled, the input is returned unchanged. No content comparison occurs.
  • Empty input. Returns an empty list.
  • No duplicates. All items are unique; the output is identical to the input.
  • All items identical. All items have the same content. Only one survives: the one with the highest score (or lowest index if scores are equal).
  • Duplicate content, different metadata. Only content is compared. Items with the same content but different kind, source, tags, or other fields are considered duplicates. The surviving item retains all its original fields.

Conformance Notes

  • Content comparison MUST be byte-exact ordinal. No Unicode normalization (NFC, NFD, NFKC, NFKD), no case folding, no whitespace normalization.
  • The survivor for each content group is the item with the highest score. On score ties, the item with the lowest original index (position in the scored input array) wins.
  • The output MUST preserve the relative order of surviving items from the input. If items at indices 0, 3, and 7 survive, they appear in that order in the output.

Stage 4: Sort

The Sort stage orders scored items by descending score using a stable sort with a prescribed tiebreaking rule.

Overview

After deduplication, items must be sorted so that the highest-scored items appear first. This ordering is consumed by the Slice stage, which greedily or optimally selects items from the front of the sorted list.

Sort stability is critical: when two items have the same score, their relative order from the input must be preserved. This ensures deterministic, reproducible output across all conforming implementations.

Pitfall (P1): Sort Stability. Implementations MUST use a stable sort algorithm or equivalently sort by the composite key (score descending, originalIndex ascending). An unstable sort (e.g., quicksort without index tiebreaking) can produce different orderings for equal-scored items, causing conformance test failures.

Algorithm

SORT(deduped):
    // Build composite sort keys
    keys <- new array of (score: float64, index: int) of length(deduped)
    for i <- 0 to length(deduped) - 1:
        keys[i] <- (deduped[i].score, i)

    // Sort by score descending, then by index ascending (tiebreak)
    SORT-BY(keys, comparator):
        comparator(a, b):
            if a.score != b.score:
                return COMPARE-DESCENDING(a.score, b.score)
            else:
                return COMPARE-ASCENDING(a.index, b.index)

    // Reconstruct sorted array
    sorted <- new array of length(deduped)
    for i <- 0 to length(keys) - 1:
        sorted[i] <- deduped[keys[i].index]

    return sorted

Implementation Pattern

The algorithm builds an array of (score, index) pairs, sorts this auxiliary array using the composite comparator, then reconstructs the ScoredItem array by following the index references. This pattern works with any sort algorithm (stable or unstable) because the index component of the composite key provides a deterministic tiebreak.

Alternatively, implementations may use a natively stable sort and sort only by score descending, relying on the sort’s stability guarantee to preserve input order for equal scores. Both approaches produce identical results.

Edge Cases

  • Empty input. Returns an empty array.
  • Single item. Returns a single-element array (no sorting needed).
  • All equal scores. Output preserves input order entirely (the index tiebreak produces ascending index order).
  • NaN scores. Behavior with NaN scores is implementation-defined. IEEE 754 comparisons involving NaN return false for all relational operators, which may cause items with NaN scores to appear in an unpredictable position. Conforming implementations are not required to handle NaN scores in any particular way. Well-behaved scorers do not produce NaN.

Conformance Notes

  • The sort MUST be stable, or equivalently MUST use the composite key (score descending, originalIndex ascending).
  • “Original index” refers to the item’s position in the input to the Sort stage (i.e., the deduplication output), not the item’s position in the original candidate list.
  • Score comparison uses standard IEEE 754 floating-point ordering (total order for non-NaN values).

Stage 5: Slice

The Slice stage selects items from the sorted list that fit within the effective token budget. It delegates to a configurable slicer strategy.

Overview

Slicing is where the token budget is enforced. The pipeline computes an effective budget that accounts for tokens already committed (pinned items and output reserve), then delegates to the configured slicer strategy to select which items to include.

The slicer receives items already sorted by score descending (from the Sort stage), so it can make informed decisions about which items to drop when the budget is tight.

Effective Budget Computation

Before invoking the slicer, the pipeline computes the adjusted budget:

COMPUTE-EFFECTIVE-BUDGET(budget, pinnedTokens):
    reservedTokens  <- sum(budget.reservedSlots.values)
    effectiveMax    <- max(0, budget.maxTokens - budget.outputReserve - pinnedTokens - reservedTokens)
    effectiveTarget <- max(0, budget.targetTokens - pinnedTokens - reservedTokens)
    effectiveTarget <- min(effectiveTarget, effectiveMax)

    if budget.estimationSafetyMarginPercent > 0:
        multiplier      <- 1.0 - budget.estimationSafetyMarginPercent / 100.0
        effectiveMax    <- floor(effectiveMax * multiplier)
        effectiveTarget <- floor(effectiveTarget * multiplier)
        effectiveTarget <- min(effectiveTarget, effectiveMax)

    return ContextBudget(
        maxTokens:    effectiveMax,
        targetTokens: effectiveTarget
    )

Where:

  • pinnedTokens is the sum of tokens for all pinned items (computed during Classify).
  • reservedTokens is the sum of all values in budget.reservedSlots, reserving token capacity for per-kind guarantees.
  • effectiveMax is the hard ceiling for scoreable items after reserving space for output, pinned items, and reserved slots.
  • effectiveTarget is the soft goal for the slicer, clamped to effectiveMax.
  • The safety margin reduces the effective budget by the configured percentage to account for token estimation error. Both values use floor (integer truncation) when converting from the floating-point product.

Algorithm

SLICE(sorted, budget, pinnedTokens, slicer):
    adjustedBudget <- COMPUTE-EFFECTIVE-BUDGET(budget, pinnedTokens)

    // Delegate to slicer strategy
    slicedItems <- slicer.Slice(sorted, adjustedBudget)

    return slicedItems

Slicer Interface

A slicer implements a single function:

Slice(sorted: list of ScoredItem, budget: ContextBudget) -> list of ContextItem
  • sorted is the score-descending sorted list of ScoredItem pairs.
  • budget is the effective budget (already adjusted for pinned items and output reserve).
  • The return value is the list of selected ContextItem instances (without scores).

See the Slicers chapter for all slicer algorithm specifications.

Edge Cases

  • Empty sorted list. The slicer receives an empty list and returns an empty list.
  • Zero effective budget. If effectiveMax = 0 or effectiveTarget = 0, the slicer should select no items (or only zero-token items, depending on the slicer strategy).
  • All items fit. If the total tokens of all sorted items is within effectiveTarget, the slicer may select all items.
  • Pinned items consume entire budget. If pinnedTokens >= targetTokens, then effectiveTarget = 0. The slicer selects no items (or only zero-token items), and the final output consists of pinned items only.
  • Reserved slots exceed remaining budget. If the sum of reservedSlots values exceeds maxTokens - outputReserve - pinnedTokens, then effectiveMax = 0 and the slicer selects no items (or only zero-token items).

Conformance Notes

  • The effective budget computation MUST match the formulas above exactly. In particular, effectiveTarget is clamped to effectiveMax (not the other way around).
  • The slicer receives a fresh ContextBudget with only maxTokens and targetTokens set. The effects of outputReserve, reservedSlots, and estimationSafetyMarginPercent are incorporated into the maxTokens and targetTokens values of the adjusted budget; the original fields are not forwarded.
  • The slicer output is a list of ContextItem (not ScoredItem). Scores are re-associated by the pipeline after slicing, by matching items back to their scored counterparts using reference identity.

Stage 6: Place

The Place stage merges pinned items with sliced items, detects and handles token budget overflow, and delegates to a configurable placer strategy for final ordering.

Overview

Placing is the final pipeline stage. It has three responsibilities:

  1. Merge pinned items (from Classify) with sliced items (from Slice) into a single scored item list. Pinned items are assigned a score of 1.0.
  2. Detect and handle overflow — if the merged total exceeds targetTokens, apply the configured OverflowStrategy.
  3. Order the merged items by delegating to the configured placer strategy.

Algorithm

PLACE(pinned, slicedScored, budget, overflowStrategy, placer):
    // Step 1: Merge pinned items with sliced scored items
    merged <- new array of length(pinned) + length(slicedScored)
    for i <- 0 to length(pinned) - 1:
        merged[i] <- ScoredItem(pinned[i], 1.0)
    for i <- 0 to length(slicedScored) - 1:
        merged[length(pinned) + i] <- slicedScored[i]

    // Step 2: Overflow detection
    mergedTokens <- 0
    for i <- 0 to length(merged) - 1:
        mergedTokens <- mergedTokens + merged[i].item.tokens

    if mergedTokens > budget.targetTokens:
        merged <- HANDLE-OVERFLOW(merged, budget.targetTokens, overflowStrategy)

    // Step 3: Delegate to placer for final ordering
    placed <- placer.Place(merged)

    return placed

Overflow Handling

HANDLE-OVERFLOW(merged, targetTokens, strategy):
    if strategy = Throw:
        ERROR("Selected items require {mergedTokens} tokens,
               exceeding target budget of {targetTokens}")

    if strategy = Truncate:
        // Keep items from front (highest-scored), skip from back
        // Pinned items are never removed
        kept <- empty list
        currentTokens <- 0
        for i <- 0 to length(merged) - 1:
            if merged[i].item.pinned:
                APPEND(kept, merged[i])
                currentTokens <- currentTokens + merged[i].item.tokens
            else if currentTokens + merged[i].item.tokens <= targetTokens:
                APPEND(kept, merged[i])
                currentTokens <- currentTokens + merged[i].item.tokens
            // else: excluded (BudgetExceeded or PinnedOverride)
        return kept

    if strategy = Proceed:
        // Accept over-budget selection, notify observer
        NOTIFY-OVERFLOW(mergedTokens - targetTokens, merged)
        return merged

Placer Interface

A placer implements a single function:

Place(merged: list of ScoredItem) -> list of ContextItem
  • merged is the list of ScoredItem pairs (pinned items with score 1.0, sliced items with their computed scores).
  • The return value is the final ordered list of ContextItem instances.

See the Placers chapter for all placer algorithm specifications.

Pinned Item Score

Pinned items are assigned a score of 1.0 when merged into the scored item list. This score value represents the highest conventional relevance — pinned items are always-included by definition. The 1.0 score affects placer behavior (e.g., UShapedPlacer places highest-scored items at the edges).

Edge Cases

  • No pinned items. The merged list contains only sliced items. Overflow detection and placement proceed normally.
  • No sliced items. The merged list contains only pinned items (all with score 1.0). This occurs when the scoreable list was empty or the entire effective budget was zero.
  • Truncation cannot reach target. If pinned items alone exceed targetTokens, truncation removes all non-pinned items but the total still exceeds the target. This is a best-effort situation — pinned items are never removed.
  • Empty merged list. Both pinned and sliced lists are empty. The placer receives an empty list and returns an empty list.

Conformance Notes

  • Pinned items MUST appear before sliced items in the merged array (indices [0, length(pinned)) for pinned, [length(pinned), end) for sliced).
  • Overflow detection compares against budget.targetTokens (the original target, not the effective target used for slicing).
  • The Truncate strategy iterates the merged array in order (pinned first, then sliced items in score-descending order). Pinned items are always kept; non-pinned items are kept while they fit within targetTokens.
  • The placer receives the post-overflow merged list (which may be shorter than the pre-overflow list if Truncate was applied).

Scorers

Scorers compute relevance scores for context items. They are the primary ranking mechanism in the Cupel pipeline, invoked during Stage 2: Score.

Scorer Interface

A scorer implements a single pure function:

Score(item: ContextItem, allItems: list of ContextItem) -> float64
  • item is the individual ContextItem being scored.
  • allItems is the complete list of scoreable items (including item itself). This enables relative scoring — scorers that rank an item against its peers.
  • The return value is an IEEE 754 64-bit double-precision floating-point number.

Contract

  1. Pure function. Scorers MUST NOT mutate state between calls, perform I/O, or produce side effects. Given identical inputs, a scorer MUST return the same output.
  2. Conventional range. Scores are conventionally in the range [0.0, 1.0], where 0.0 indicates lowest relevance and 1.0 indicates highest relevance. This range is a convention, not an enforced constraint — custom KindScorer weights may produce values outside this range.
  3. IEEE 754 arithmetic. All scoring computations MUST use 64-bit double-precision floating-point arithmetic (see Numeric Precision).

Scorer Summary

ScorerInput Fields UsedOutput RangeDescription
RecencyScorertimestamp[0.0, 1.0]Rank-based: more recent items score higher
PriorityScorerpriority[0.0, 1.0]Rank-based: higher priority values score higher
KindScorerkind[0.0, max weight]Dictionary lookup by item kind
TagScorertags[0.0, 1.0]Weighted tag matching, normalized
FrequencyScorertags[0.0, 1.0]Proportion of peers sharing a tag
ReflexiveScorerfutureRelevanceHint[0.0, 1.0]Passthrough of caller-provided hint
CompositeScorer(delegates to children)[0.0, 1.0]Weighted average of child scorers
ScaledScorer(delegates to inner)[0.0, 1.0]Min-max normalization of an inner scorer
MetadataTrustScorermetadata["cupel:trust"][0.0, 1.0]Passthrough of caller-provided trust value from metadata
DecayScorertimestamp[0.0, 1.0]Time-based decay with configurable curve (Exponential, Step, Window)
MetadataKeyScorermetadata[key]{1.0, boost}Multiplicative boost for items where metadata key matches a configured value

Scorer Categories

Absolute Scorers

Absolute scorers examine only the item being scored; the allItems parameter is ignored. These scorers produce the same score for an item regardless of what other items are present.

  • KindScorer — dictionary lookup
  • TagScorer — weighted tag matching
  • ReflexiveScorer — passthrough of futureRelevanceHint
  • MetadataTrustScorer — passthrough of caller-provided trust value from metadata["cupel:trust"]
  • DecayScorer — time-based decay from item timestamp against a reference time (Exponential, Step, or Window curve)
  • MetadataKeyScorer — multiplicative boost for items where a metadata key matches a configured value

Relative Scorers

Relative scorers compare the item against its peers. The same item may receive different scores depending on the composition of allItems.

  • RecencyScorer — rank among timestamped items
  • PriorityScorer — rank among prioritized items
  • FrequencyScorer — proportion of peers sharing a tag

Composite Scorers

Composite scorers delegate to other scorers and transform or combine their outputs.

  • CompositeScorer — weighted average of multiple child scorers
  • ScaledScorer — min-max normalization wrapper for any inner scorer

RecencyScorer

The RecencyScorer assigns higher scores to more recent items based on their relative timestamp rank among all scoreable items.

Overview

RecencyScorer is a relative scorer — it compares each item’s timestamp against all other items’ timestamps. The score represents the item’s position in the temporal ordering: the most recent item scores 1.0, the oldest scores 0.0, and items in between are linearly interpolated.

Items without a timestamp receive a score of 0.0.

Fields Used

FieldSourcePurpose
timestampContextItemTemporal ordering key

Algorithm

For each item, count how many other timestamped items have a strictly earlier timestamp. This count (the item’s rank) is divided by the number of timestamped items minus one to produce a normalized score.

RECENCY-SCORE(item, allItems):
    // Items without a timestamp always score 0.0
    if item.timestamp = null:
        return 0.0

    countWithTimestamp <- 0
    rank <- 0

    for i <- 0 to length(allItems) - 1:
        if allItems[i].timestamp = null:
            continue
        countWithTimestamp <- countWithTimestamp + 1
        if allItems[i].timestamp < item.timestamp:
            rank <- rank + 1

    // Single timestamped item is the "most recent" by definition
    if countWithTimestamp <= 1:
        return 1.0

    return rank / (countWithTimestamp - 1)

Score Distribution

Given N items with timestamps, all with distinct timestamps:

  • The most recent item has rank N - 1, scoring 1.0.
  • The oldest item has rank 0, scoring 0.0.
  • Intermediate items score rank / (N - 1), producing evenly-spaced values.

Tied Timestamps

When multiple items share the same timestamp, they receive the same rank (the count of items strictly older than them). This means tied items receive identical scores.

Edge Cases

ConditionResult
item.timestamp is null0.0
All items have null timestampsAll score 0.0
Only one item has a timestampThat item scores 1.0
All items share the same timestampAll score 0.0 (rank = 0 for each, since no item is strictly older)

Note: The single-item case returns 1.0 (not 0.0) because a lone timestamped item is trivially the most recent.

Complexity

  • Time: O(N) per item, O(N^2) total across all items in a scoring pass.
  • Space: O(1) auxiliary per invocation.

Conformance Notes

  • Timestamp comparison is by temporal ordering (the underlying instant), not by string representation.
  • The comparison allItems[i].timestamp < item.timestamp is strict — items with equal timestamps do not increment the rank.
  • The denominator is countWithTimestamp - 1, not length(allItems) - 1. Only items that have a timestamp participate in the ranking.

PriorityScorer

The PriorityScorer assigns higher scores to items with higher numeric priority values, based on their relative rank among all scoreable items.

Overview

PriorityScorer is a relative scorer with the same rank-based algorithm as RecencyScorer, but applied to the priority field instead of timestamp. Higher numeric priority values produce higher scores.

Items without a priority receive a score of 0.0.

Fields Used

FieldSourcePurpose
priorityContextItemNumeric importance ranking key

Algorithm

For each item, count how many other prioritized items have a strictly lower priority value. This count (the item’s rank) is divided by the number of prioritized items minus one to produce a normalized score.

PRIORITY-SCORE(item, allItems):
    // Items without a priority always score 0.0
    if item.priority = null:
        return 0.0

    countWithPriority <- 0
    rank <- 0

    for i <- 0 to length(allItems) - 1:
        if allItems[i].priority = null:
            continue
        countWithPriority <- countWithPriority + 1
        if allItems[i].priority < item.priority:
            rank <- rank + 1

    // Single prioritized item scores 1.0
    if countWithPriority <= 1:
        return 1.0

    return rank / (countWithPriority - 1)

Score Distribution

Given N items with priorities, all with distinct priority values:

  • The highest-priority item has rank N - 1, scoring 1.0.
  • The lowest-priority item has rank 0, scoring 0.0.
  • Intermediate items score rank / (N - 1), producing evenly-spaced values.

Tied Priorities

When multiple items share the same priority value, they receive the same rank (the count of items with strictly lower priority). Tied items receive identical scores.

Edge Cases

ConditionResult
item.priority is null0.0
All items have null prioritiesAll score 0.0
Only one item has a priorityThat item scores 1.0
All items share the same priorityAll score 0.0 (no item has strictly lower priority)

Complexity

  • Time: O(N) per item, O(N^2) total across all items in a scoring pass.
  • Space: O(1) auxiliary per invocation.

Conformance Notes

  • Higher numeric priority values produce higher scores. priority = 10 scores higher than priority = 5.
  • The comparison allItems[i].priority < item.priority is strict — items with equal priority do not increment the rank.
  • The denominator is countWithPriority - 1, not length(allItems) - 1. Only items that have a priority participate in the ranking.
  • The rank-based algorithm is identical to RecencyScorer, substituting priority for timestamp.

KindScorer

The KindScorer assigns a score to each item based on its ContextKind, using a configurable weight map.

Overview

KindScorer is an absolute scorer — it examines only the item’s kind field. The allItems parameter is ignored. The score is a direct lookup in a weight dictionary: if the item’s kind has a configured weight, that weight is returned; otherwise, the score is 0.0.

Fields Used

FieldSourcePurpose
kindContextItemDictionary lookup key

Default Weights

When no custom weights are provided, the following defaults are used:

ContextKindWeight
"SystemPrompt"1.0
"Memory"0.8
"ToolOutput"0.6
"Document"0.4
"Message"0.2

These defaults reflect a common heuristic: system prompts and memories are typically more important to preserve than messages.

Algorithm

KIND-SCORE(item, allItems, weights):
    // weights is a map of ContextKind -> float64

    if weights contains item.kind:    // ContextKind equality is case-insensitive (see enumerations.md)
        return weights[item.kind]
    else:
        return 0.0

Case-insensitivity is a property of ContextKind equality as defined in Enumerations, not an ad-hoc behavior of the weight dictionary.

Construction Validation

Custom weight maps MUST be validated at construction time:

  1. Each weight value MUST be non-negative (>= 0.0).
  2. Each weight value MUST be finite (not NaN, not positive/negative infinity).
  3. Weight values are not required to be in [0.0, 1.0] — custom weights may exceed 1.0.
VALIDATE-KIND-WEIGHTS(weights):
    for each (kind, weight) in weights:
        if weight < 0.0:
            ERROR("Weight for kind must be non-negative")
        if weight is not finite:
            ERROR("Weight for kind must be finite")

Edge Cases

ConditionResult
Item’s kind is in the weight mapThe corresponding weight value
Item’s kind is not in the weight map0.0
Custom weight map is emptyAll items score 0.0
Custom weight exceeds 1.0The weight value is returned as-is (no clamping)

Complexity

  • Time: O(1) per item (hash map lookup).
  • Space: O(K) where K is the number of configured kinds.

Conformance Notes

  • ContextKind comparison MUST be case-insensitive using ASCII case folding (see ContextKind). A weight configured for "Message" matches items with kind "message", "MESSAGE", etc.
  • The default weight map MUST contain exactly the five well-known ContextKind values listed above, with the specified weights. Implementations MAY use a different internal representation (e.g., frozen dictionary, immutable map) but MUST produce the same lookup behavior.
  • Custom weights are not clamped to [0.0, 1.0]. An implementation using KindScorer with weights > 1.0 should be aware that composite scoring may produce scores outside the conventional range.

TagScorer

The TagScorer assigns a score based on how well an item’s tags match a configured set of weighted tags.

Overview

TagScorer is an absolute scorer — it examines only the item’s tags field against a preconfigured tag-weight map. The allItems parameter is ignored. The score is the sum of matched tag weights divided by the total configured weight, clamped to [0.0, 1.0].

Fields Used

FieldSourcePurpose
tagsContextItemTags to match against configured weights

Algorithm

TAG-SCORE(item, allItems, tagWeights, totalWeight):
    // tagWeights: map of string -> float64
    // totalWeight: sum of all values in tagWeights (precomputed at construction)

    if totalWeight = 0.0 or length(item.tags) = 0:
        return 0.0

    matchedSum <- 0.0
    for i <- 0 to length(item.tags) - 1:
        if tagWeights contains item.tags[i]:    // case-sensitive lookup (default)
            matchedSum <- matchedSum + tagWeights[item.tags[i]]

    return min(matchedSum / totalWeight, 1.0)

Construction

TagScorer is constructed with a tag-weight map. At construction time:

  1. The totalWeight is precomputed as the sum of all weight values.
  2. Each weight MUST be non-negative (>= 0.0).
  3. Each weight MUST be finite (not NaN, not positive/negative infinity).
CONSTRUCT-TAG-SCORER(tagWeights):
    totalWeight <- 0.0
    for each (tag, weight) in tagWeights:
        if weight < 0.0:
            ERROR("Tag weight must be non-negative")
        if weight is not finite:
            ERROR("Tag weight must be finite")
        totalWeight <- totalWeight + weight
    // Store tagWeights and totalWeight for use in scoring

Score Interpretation

  • A score of 1.0 means the item’s matched tags account for 100% or more of the total configured weight.
  • A score of 0.0 means no tags matched (or the item has no tags, or no weights are configured).
  • The min(..., 1.0) clamp prevents scores from exceeding 1.0 when an item matches all configured tags or when duplicate tag entries would cause the sum to exceed the total.

Edge Cases

ConditionResult
Item has no tags0.0
totalWeight is 0.0 (all weights are zero)0.0
Item has tags but none match configured weights0.0
Item matches all configured tagsmin(sum of matched / total, 1.0) = 1.0 if exactly all matched
Item has duplicate tags matching the same weightEach occurrence adds to matchedSum; result clamped to 1.0

Complexity

  • Time: O(T) per item, where T is the number of tags on the item (hash map lookup per tag).
  • Space: O(W) where W is the number of configured tag weights.

Conformance Notes

  • Tag key lookup in the weight map uses the string comparer provided at construction, defaulting to ordinal (case-sensitive) comparison. This is distinct from the case-insensitive tag comparison used by FrequencyScorer.
  • The totalWeight denominator is the sum of all configured weights, not just the weights of matched tags.
  • The result is clamped to min(result, 1.0). It is never clamped to a minimum of 0.0 because matchedSum can only be non-negative (all weights are non-negative).

FrequencyScorer

The FrequencyScorer scores an item based on the proportion of peer items that share at least one tag with it.

Overview

FrequencyScorer is a relative scorer — it compares the item’s tags against every other item’s tags to determine how “connected” the item is within the candidate set. Items that share tags with many peers score higher, reflecting thematic relevance within the context.

Fields Used

FieldSourcePurpose
tagsContextItemTag overlap detection with peers

Algorithm

FREQUENCY-SCORE(item, allItems):
    if length(item.tags) = 0 or length(allItems) <= 1:
        return 0.0

    matchingItems <- 0

    for i <- 0 to length(allItems) - 1:
        // Skip self (by identity, not by value)
        if allItems[i] is item:
            continue
        if length(allItems[i].tags) = 0:
            continue
        if SHARES-ANY-TAG(item.tags, allItems[i].tags):
            matchingItems <- matchingItems + 1

    return matchingItems / (length(allItems) - 1)

Tag Overlap Detection

SHARES-ANY-TAG(tagsA, tagsB):
    for i <- 0 to length(tagsA) - 1:
        for j <- 0 to length(tagsB) - 1:
            if CASE-INSENSITIVE-EQUAL(tagsA[i], tagsB[j]):
                return true
    return false

Score Interpretation

  • A score of 1.0 means every other item in the list shares at least one tag with this item.
  • A score of 0.0 means no other item shares any tag with this item (or the item has no tags, or it is the only item).
  • Intermediate values represent the fraction of peers with overlapping tags.

Edge Cases

ConditionResult
Item has no tags0.0
Only one item in allItems0.0
No peer shares any tag0.0
All peers share at least one tag1.0
Peer has no tagsThat peer does not count as matching

Self-Exclusion

The item being scored is excluded from the peer count using reference identity (object identity), not value equality. This means:

  • If the same ContextItem instance appears multiple times in allItems, only the reference-identical instance is skipped; other copies with identical content are counted as peers.
  • The denominator is always length(allItems) - 1, regardless of how many items are skipped.

Complexity

  • Time: O(N * T_a * T_b) per item in the worst case, where N is the number of items and T_a, T_b are tag list lengths. O(N^2 * T^2) total across all items.
  • Space: O(1) auxiliary per invocation.

Conformance Notes

  • Tag comparison in SHARES-ANY-TAG MUST be case-insensitive using ASCII case folding. "Important" and "important" are considered matching tags.
  • Self-exclusion MUST use reference identity (the is check), not structural equality. This matches the ContextItem immutability contract — items are compared by identity throughout the pipeline.
  • The denominator is length(allItems) - 1, which accounts for the self-exclusion. It is not reduced further for tagless peers.

ReflexiveScorer

The ReflexiveScorer passes through the item’s futureRelevanceHint as the score, clamped to [0.0, 1.0].

Overview

ReflexiveScorer is an absolute scorer — it examines only the item’s futureRelevanceHint field. The allItems parameter is ignored. This scorer enables callers (or upstream systems like an LLM) to inject relevance signals directly into the scoring pipeline.

Fields Used

FieldSourcePurpose
futureRelevanceHintContextItemCaller-provided relevance signal

Algorithm

REFLEXIVE-SCORE(item, allItems):
    if item.futureRelevanceHint = null:
        return 0.0

    value <- item.futureRelevanceHint

    if value is not finite:    // NaN, +infinity, -infinity
        return 0.0

    return clamp(value, 0.0, 1.0)

Where clamp(x, lo, hi) returns lo if x < lo, hi if x > hi, and x otherwise.

Score Interpretation

  • The score is a direct passthrough of the caller-provided hint, clamped to the conventional [0.0, 1.0] range.
  • A hint of 0.0 means the caller considers the item irrelevant for future context.
  • A hint of 1.0 means the caller considers the item highly relevant for future context.
  • Values outside [0.0, 1.0] are clamped, not rejected.

Edge Cases

ConditionResult
futureRelevanceHint is null0.0
futureRelevanceHint is NaN0.0
futureRelevanceHint is +infinity0.0
futureRelevanceHint is -infinity0.0
futureRelevanceHint is 0.50.5
futureRelevanceHint is -0.30.0 (clamped)
futureRelevanceHint is 1.71.0 (clamped)

Complexity

  • Time: O(1) per item.
  • Space: O(1) auxiliary.

Conformance Notes

  • Non-finite values (NaN, positive infinity, negative infinity) MUST return 0.0, not be clamped to a range boundary.
  • The check for finiteness MUST occur before the clamp. A NaN value must not pass through clamp (which may have implementation-defined behavior for NaN in some languages).
  • The clamping order is: null check, then finiteness check, then clamp to [0.0, 1.0].

CompositeScorer

The CompositeScorer combines multiple child scorers into a single score using a weighted average.

Overview

CompositeScorer is a composite scorer — it delegates to a list of child scorers, each with an associated weight, and returns the weighted average of their outputs. This enables building rich scoring strategies from simple building blocks.

CompositeScorer is the primary mechanism for combining multiple scoring signals (e.g., “60% recency + 30% priority + 10% kind”) into a single relevance score.

Algorithm

Scoring

COMPOSITE-SCORE(item, allItems, scorers, normalizedWeights):
    // scorers: array of Scorer
    // normalizedWeights: array of float64, sum = 1.0

    result <- 0.0
    for i <- 0 to length(scorers) - 1:
        result <- result + scorers[i].Score(item, allItems) * normalizedWeights[i]
    return result

Construction

At construction time, the raw weights are normalized so they sum to 1.0:

CONSTRUCT-COMPOSITE(entries):
    // entries: list of (scorer, weight) pairs

    if length(entries) = 0:
        ERROR("At least one scorer entry is required")

    totalWeight <- 0.0
    for i <- 0 to length(entries) - 1:
        (scorer, weight) <- entries[i]
        if scorer = null:
            ERROR("Scorer must not be null")
        if weight <= 0.0:
            ERROR("Weight must be positive")
        if weight is not finite:
            ERROR("Weight must be finite")
        totalWeight <- totalWeight + weight

    normalizedWeights <- new array of length(entries)
    for i <- 0 to length(entries) - 1:
        normalizedWeights[i] <- entries[i].weight / totalWeight

    DETECT-CYCLES(entries)

    self.scorers           <- [entries[i].scorer for i in 0..length(entries)]
    self.normalizedWeights <- normalizedWeights

Weight Normalization

Weights are relative, not absolute. The following configurations produce identical results:

  • [(RecencyScorer, 3.0), (PriorityScorer, 1.0)]
  • [(RecencyScorer, 0.75), (PriorityScorer, 0.25)]

Both normalize to weights [0.75, 0.25].

Cycle Detection

CompositeScorer supports nesting — a CompositeScorer may contain other CompositeScorer or ScaledScorer instances as children. However, circular references are invalid and MUST be detected at construction time.

DETECT-CYCLES(entries):
    visited <- empty set (by reference identity)
    inPath  <- empty set (by reference identity)
    for i <- 0 to length(entries) - 1:
        DETECT-CYCLES-DFS(entries[i].scorer, visited, inPath)

DETECT-CYCLES-DFS(node, visited, inPath):
    if node is in visited:
        if node is in inPath:
            ERROR("Cycle detected: scorer appears in its own dependency graph")
        return
    add node to visited
    add node to inPath

    if node is a CompositeScorer:
        for each child in node.children:
            DETECT-CYCLES-DFS(child, visited, inPath)
    else if node is a ScaledScorer:
        DETECT-CYCLES-DFS(node.inner, visited, inPath)

    remove node from inPath

The cycle detection traverses the scorer graph using depth-first search with reference identity comparison. A cycle exists when a scorer appears in its own ancestor path.

Edge Cases

ConditionResult
Empty entries listConstruction error
Single entryNormalized weight is 1.0; effectively delegates to the child
All children return 0.0Composite returns 0.0
All children return 1.0Composite returns 1.0 (weights sum to 1.0)
Nested CompositeScorerValid if acyclic; each child is scored recursively
Self-referential cycleConstruction error

Complexity

  • Construction: O(V + E) for cycle detection, where V is the number of distinct scorer nodes and E is the number of parent-child edges.
  • Scoring: O(C * cost_per_child) per item, where C is the number of child scorers. For a flat composite with simple children, this is O(C) per item.
  • Space: O(C) for normalized weight storage, O(V) for cycle detection.

Conformance Notes

  • Weights MUST be positive (strictly greater than 0.0) and finite. Zero weights are rejected at construction time.
  • Weight normalization MUST divide each weight by the sum of all weights. The normalized weights MUST sum to approximately 1.0 (subject to floating-point precision).
  • Cycle detection MUST use reference identity, not structural equality. Two distinct CompositeScorer instances with identical configurations are not considered cyclic.
  • Child scorers are invoked in order (index 0, 1, 2, …). While the weighted sum is mathematically commutative, the invocation order is fixed for determinism.

ScaledScorer

The ScaledScorer applies min-max normalization to an inner scorer’s output across all items, ensuring the resulting scores span the full [0.0, 1.0] range.

Overview

ScaledScorer is a composite scorer — it wraps a single inner scorer and normalizes its output using the formula (raw - min) / (max - min), where min and max are the inner scorer’s minimum and maximum outputs across all items in allItems.

This is useful when an inner scorer produces scores clustered in a narrow range (e.g., [0.3, 0.5]). ScaledScorer spreads them to [0.0, 1.0], maximizing the discriminative power of the scorer.

Fields Used

ScaledScorer uses whatever fields its inner scorer uses.

Algorithm

SCALED-SCORE(item, allItems, inner):
    if length(allItems) = 0:
        return 0.5

    rawScore <- NaN    // sentinel: not yet computed
    min <- +infinity
    max <- -infinity

    // Single pass: score all items, find min/max, capture this item's raw score
    for i <- 0 to length(allItems) - 1:
        s <- inner.Score(allItems[i], allItems)
        if s < min:
            min <- s
        if s > max:
            max <- s
        if allItems[i] is item:    // reference identity
            rawScore <- s

    // Degenerate case: all scores equal
    if max = min:
        return 0.5

    return (rawScore - min) / (max - min)

Score Distribution

After scaling:

  • The item(s) with the inner scorer’s minimum score receive 0.0.
  • The item(s) with the inner scorer’s maximum score receive 1.0.
  • All other items are linearly interpolated between 0.0 and 1.0.

Degenerate Case

When all items receive the same score from the inner scorer (max = min), the normalization formula would divide by zero. In this case, ScaledScorer returns 0.5 — the midpoint of the [0.0, 1.0] range. This represents “no information” — all items are equally relevant according to the inner scorer.

Similarly, when allItems is empty, the result is 0.5.

Edge Cases

ConditionResult
Empty allItems0.5
Single item in allItems0.5 (max = min)
All items receive the same inner score0.5 (max = min)
Two items with distinct inner scoresOne gets 0.0, the other gets 1.0
Inner scorer returns values outside [0.0, 1.0]Scaling still produces [0.0, 1.0] output

Complexity

  • Time: O(N) per item (scores all N items to find min/max). O(N^2) total across all items in a scoring pass, since each item triggers a full scan.
  • Space: O(1) auxiliary per invocation.

Performance note: Each call to SCALED-SCORE invokes the inner scorer N times. Across a full scoring pass of N items, the inner scorer is invoked N^2 times total. Implementations MAY cache the min/max computation across invocations within a single scoring pass, but caching is not required by this specification. When ScaledScorer instances are deeply nested — a ScaledScorer wrapping another ScaledScorer — the cost compounds to O(N^(depth+1)) per scoring pass. Callers with multiple scale-normalization requirements should prefer a flat CompositeScorer structure over nested ScaledScorer instances.

Conformance Notes

  • The item is identified within allItems by reference identity (the is check), not by structural equality.
  • The inner scorer is invoked for every item in allItems on every call. There is no specification requirement to cache or memoize these results (but implementations are free to do so as an optimization, provided the output is identical).
  • The degenerate return value MUST be exactly 0.5, not 0.0 or 1.0.
  • ScaledScorer participates in CompositeScorer cycle detection — a CompositeScorer containing a ScaledScorer will traverse through the ScaledScorer to its inner scorer.

MetadataTrustScorer

The MetadataTrustScorer passes through a caller-provided trust value stored in metadata["cupel:trust"] as the score, clamped to [0.0, 1.0].

Overview

MetadataTrustScorer is an absolute scorer — it examines only the item’s metadata map. The allItems parameter is ignored. This scorer enables callers to inject a trust signal into the scoring pipeline by setting a well-known metadata key at item construction time.

Fields Used

FieldSourcePurpose
metadataContextItemKey-value map; the scorer reads the "cupel:trust" key

Metadata Namespace Reservation

All ContextItem.metadata keys with the "cupel:" prefix are reserved for library-defined conventions. Callers MUST NOT use this prefix for application-specific keys. This reservation is enforced at the spec level only — there is no runtime validation of key prefixes.

Conventions

cupel:trust

cupel:trust is a caller-computed trust value for the item, represented as a float64 in [0.0, 1.0].

  • Storage format: In Rust, metadata is HashMap<String, String> — the trust value MUST be stored as a decimal string (e.g., "0.85"). The scorer parses this string to a float64 at scoring time.
  • In .NET, metadata is IReadOnlyDictionary<string, object?>, which supports heterogeneous value types. Callers MAY store the trust value as a double directly. Implementations MUST handle both string and double (or boxed numeric) values in the dictionary lookup. The canonical wire format for serialization interop is the decimal string representation.
  • The value is caller-computed. The pipeline does not set or modify this key.
  • The scorer reads cupel:trust as a scoring input only. It does not use the value as a filter or gate — items are never excluded based on this score.

cupel:source-type

cupel:source-type is an open string convention for labeling the origin of a context item. The following values are RECOMMENDED:

ValueMeaning
"user"An end-user message
"tool"A tool or function call result
"external"Retrieved or injected context (e.g., RAG results)
"system"A system prompt fragment

Callers MAY use other values. Values are not validated at construction time. This convention is a labeling aid for caller logic and is not used by any scorer in the library.

Algorithm

METADATA-TRUST-SCORE(item, allItems, config):
    if item.metadata does not contain "cupel:trust":
        return config.defaultScore

    raw <- item.metadata["cupel:trust"]   // string in Rust; string or double in .NET

    value <- parse_float64(raw)           // or cast if already float64

    if parse failed:
        return config.defaultScore

    if value is not finite:               // NaN, +infinity, -infinity
        return config.defaultScore

    return clamp(value, 0.0, 1.0)

Where clamp(x, lo, hi) returns lo if x < lo, hi if x > hi, and x otherwise. config.defaultScore is a float64 in [0.0, 1.0] set at construction time.

Score Interpretation

  • The score is a direct passthrough of the caller-provided trust value, clamped to the conventional [0.0, 1.0] range.
  • A score of 0.0 reflects that the caller considers the item untrustworthy or low-trust.
  • A score of 1.0 reflects that the caller considers the item fully trusted.
  • Values outside [0.0, 1.0] are clamped, not rejected.
  • When the key is absent or the value is unparseable, config.defaultScore is returned. This default is set at scorer construction time and MUST be in [0.0, 1.0].

Edge Cases

ConditionResult
cupel:trust key absentconfig.defaultScore
cupel:trust value is unparseable (e.g., "high", "")config.defaultScore
cupel:trust value is "NaN"config.defaultScore
cupel:trust value is "+Infinity" or "-Infinity"config.defaultScore
cupel:trust value is "0.0"0.0
cupel:trust value is "0.75"0.75
cupel:trust value is "1.0"1.0
cupel:trust value is "-0.1" (below range)0.0 (clamped)
cupel:trust value is "1.5" (above range)1.0 (clamped)

Conformance Vector Outlines

The following scenarios outline the expected behavior for conformance testing. Full TOML conformance vectors require a metadata field extension to the test vector format (out of scope for this chapter); these outlines are narrative descriptions of the required behavior.

  1. Present and valid: An item with cupel:trust = "0.85" — the scorer returns 0.85.

  2. Key absent: An item whose metadata map does not contain the cupel:trust key — the scorer returns config.defaultScore (e.g., 0.5 if that is the configured default).

  3. Unparseable value: An item with cupel:trust = "high" (not a valid decimal float) — the scorer returns config.defaultScore, not an error.

  4. Out-of-range value (clamped high): An item with cupel:trust = "1.5" — the scorer returns 1.0 (clamped to the upper bound).

  5. Non-finite value: An item with cupel:trust = "NaN" or cupel:trust = "Infinity" — the scorer returns config.defaultScore, not a clamped boundary value.

Complexity

  • Time: O(1) per item (hash map lookup and float parse).
  • Space: O(1) auxiliary.

Conformance Notes

  • The clamping order is: key-missing → config.defaultScore; parse-failure → config.defaultScore; non-finite → config.defaultScore; then clamp to [0.0, 1.0].
  • Non-finite values (NaN, positive infinity, negative infinity) MUST return config.defaultScore, not be clamped to a range boundary.
  • The finiteness check MUST occur before the clamp. Non-finite values MUST NOT pass through clamp (which may have implementation-defined behavior for NaN in some languages).
  • In Rust, the metadata value at key "cupel:trust" MUST be parsed as a decimal string using standard float64 parsing (e.g., str::parse::<f64>()). A value that fails to parse MUST be treated as missing.
  • In .NET, implementations MUST handle both string and double (or other boxed numeric type) when reading the value from IReadOnlyDictionary<string, object?>. If the stored value is already a double, it MUST be used directly without string conversion. If it is a string, it MUST be parsed. If it is any other type, it MUST be treated as a parse failure and return config.defaultScore.
  • config.defaultScore MUST be in [0.0, 1.0]. Implementations SHOULD reject construction of a MetadataTrustScorer with a defaultScore outside this range.

DecayScorer

The DecayScorer assigns a score based on how much time has elapsed since an item’s timestamp, measured against a caller-supplied reference time. Older items score lower; fresher items score higher.

Overview

DecayScorer is an absolute scorer — it computes a score for each item independently using only that item’s timestamp and the injected reference time. The allItems parameter is ignored.

Contrast with RecencyScorer: RecencyScorer is a relative scorer — it ranks items against each other (most recent scores 1.0, oldest scores 0.0, with linear interpolation). DecayScorer is an absolute-decay scorer — each item’s score is derived solely from the elapsed time between its timestamp and a caller-supplied reference time, with no dependence on the other items in the set.

Fields Used

FieldSourcePurpose
timestampContextItemTemporal reference point for computing item age

TimeProvider

DecayScorer requires the caller to supply a time provider at construction. There is no default — the caller must inject one explicitly (D042).

This makes the time dependency visible, prevents stale time values in long-lived pipelines, and enables deterministic testing without special test modes.

.NET

Use System.TimeProvider (BCL since .NET 8; available in net10.0 without NuGet). Pass the system clock as TimeProvider.System. For testing, supply a FakeTimeProvider or any subclass of TimeProvider.

new DecayScorer(TimeProvider.System, curve: Curve.Exponential(halfLife: TimeSpan.FromHours(24)));

Rust

#![allow(unused)]
fn main() {
pub trait TimeProvider: Send + Sync {
    fn now(&self) -> DateTime<Utc>;
}
}

A zero-sized unit struct SystemTimeProvider implements TimeProvider via Utc::now(). The scorer stores the provider as Box<dyn TimeProvider + Send + Sync>, which is sufficient because DecayScorer is not Clone (D047).

#![allow(unused)]
fn main() {
new DecayScorer(Box::new(SystemTimeProvider), curve: Curve::exponential(half_life));
}

Algorithm

DECAY-SCORE(item, allItems, config):
    // allItems is ignored — DecayScorer is an absolute scorer
    if item.timestamp = null:
        return config.nullTimestampScore

    age <- max(duration_zero, config.timeProvider.now() - item.timestamp)
    // Negative age (future-dated item: timestamp > now) clamps to duration_zero.
    // The item is treated as maximally fresh — age zero feeds directly into the curve.

    return APPLY-CURVE(age, config.curve)

Where duration_zero is the zero-duration constant (i.e., Duration::ZERO in Rust / TimeSpan.Zero in .NET).

Curve Factories

A curve is a function from a non-negative duration (the item’s age) to a score in [0.0, 1.0]. DecayScorer ships three built-in curve factories.

Exponential(halfLife)

Returns 2^(−age / halfLife). At age zero the score is 1.0; at age equal to halfLife the score is 0.5; as age approaches infinity the score approaches 0.0 (but is never negative).

Precondition: halfLife > Duration::ZERO / halfLife > TimeSpan.Zero. Throw ArgumentException (.NET) / return Err(...) (Rust) at construction if the precondition is violated; the error message MUST name the halfLife parameter.

EXPONENTIAL-CURVE(age, halfLife):
    return pow(2.0, −duration_to_seconds(age) / duration_to_seconds(halfLife))

Where duration_to_seconds converts a duration to a non-negative float64 count of seconds.

Step(windows)

windows is an ordered list of (maxAge: Duration, score: double) pairs, sorted from youngest (smallest maxAge) to oldest (largest maxAge). The scorer walks the list and returns the score of the first entry where window.maxAge > age (strict greater-than). If age is greater than or equal to every boundary, the last entry’s score applies.

STEP-CURVE(age, windows):
    for each window in windows (youngest to oldest):
        if window.maxAge > age:
            return window.score
    // age exceeded all boundaries — fall through to last entry
    return windows[last].score

Preconditions:

  • windows MUST contain at least one entry. Throw at construction if the list is empty.
  • No entry may have maxAge = duration_zero (zero-width windows are forbidden). Throw at construction if any window.maxAge is zero.

The strict > comparison means that an item whose age equals a boundary falls into the next window (older side). For example, with windows = [(1h, 0.9), (24h, 0.5), (72h, 0.1)]:

agefirst window where maxAge > agereturned score
0h1h window0.9
1h24h window (1h is not > 1h)0.5
24h72h window0.1
72hnone — fall through0.1 (last entry)

Window(maxAge)

Binary score. Returns 1.0 when age < maxAge (half-open interval [0, maxAge)); returns 0.0 when age >= maxAge.

WINDOW-CURVE(age, maxAge):
    if age < maxAge:
        return 1.0
    return 0.0

Precondition: maxAge > Duration::ZERO / maxAge > TimeSpan.Zero. Throw at construction if violated.

The boundary is exclusive on the right: an item whose age is exactly maxAge returns 0.0.

Configuration

ParameterTypeRequiredDefaultNotes
timeProviderSystem.TimeProvider (.NET) / Box<dyn TimeProvider> (Rust)YesCaller must inject; no default (D042)
curveCurve (one of Exponential, Step, Window)YesDecay shape
nullTimestampScorefloat64, range [0.0, 1.0]No0.5Score returned for items with no timestamp. Default 0.5 is neutral: it neither rewards nor penalizes missing timestamps.

nullTimestampScore MUST be in [0.0, 1.0]. Implementations SHOULD reject construction if the value is outside this range.

Edge Cases

ConditionResult
item.timestamp is nullconfig.nullTimestampScore
Item is future-dated (timestamp > now)Age clamps to duration_zero; score = APPLY-CURVE(0, config.curve)
Age is exactly zero (freshest possible)Exponential: 1.0; Step: first window’s score (if window.maxAge > 0); Window: 1.0 (since 0 < maxAge)
age == maxAge for Window curve0.0 — boundary is half-open [0, maxAge)
Step curve: age exceeds all boundariesLast entry’s score
Exponential with very large ageApproaches 0.0; never negative

Conformance Vector Outlines

All scenarios use a fixed referenceTime = 2025-01-01T12:00:00Z (set via the injected TimeProvider).

  1. Exponential, one-half-life age: An item with timestamp = 2024-12-31T12:00:00Z (age = 24 h) and curve = Exponential(halfLife: 24h) — the scorer returns 0.5 (2^(−1) = 0.5).

  2. Future-dated item: An item with timestamp = 2025-01-02T00:00:00Z (timestamp is after referenceTime; age clamps to zero) and curve = Exponential(halfLife: 24h) — the scorer returns the same score as an age-zero item, i.e., 1.0.

  3. Null timestamp: An item with no timestamp and nullTimestampScore = 0.5 — the scorer returns 0.5 regardless of the configured curve.

  4. Step curve, second window: An item with timestamp = 2025-01-01T06:00:00Z (age = 6 h) and curve = Step(windows: [(1h, 0.9), (24h, 0.5), (72h, 0.1)]) — age 6h is not less than 1h (1h > 6h is false), but is less than 24h (24h > 6h is true) — the scorer returns 0.5 (second window’s score).

  5. Window curve, age exactly at maxAge: An item with timestamp = 2025-01-01T06:00:00Z (age = 6 h) and curve = Window(maxAge: 6h) — since age >= maxAge (6h >= 6h), the scorer returns 0.0.

Complexity

  • Time: O(1) per item for Exponential and Window; O(W) per item for Step, where W is the number of windows.
  • Space: O(1) auxiliary per invocation.

Conformance Notes

  • The allItems parameter MUST be accepted by the scorer interface but MUST be ignored. Implementations MUST NOT iterate or inspect allItems.
  • Negative age (future-dated items) MUST be clamped to duration_zero before being passed to the curve. The clamp MUST occur before the curve computation, not inside each curve.
  • nullTimestampScore defaults to 0.5. The neutral default is intentional: it avoids penalizing items that legitimately have no timestamp (e.g., synthetic items injected at pipeline construction time) while also not rewarding them above fresher real items.
  • Curve constructors MUST validate preconditions at construction time, not at scoring time. A scorer with an invalid curve configuration MUST NOT be created.
  • In .NET, halfLife and maxAge are TimeSpan; comparisons use TimeSpan.Zero. In Rust, they are Duration; comparisons use Duration::ZERO.
  • The Rust TimeProvider trait MUST be Send + Sync to allow DecayScorer to be used across thread boundaries. The stored provider is Box<dyn TimeProvider + Send + Sync>.

MetadataKeyScorer

The MetadataKeyScorer returns a configurable multiplier (boost) when a metadata key matches a configured value, and 1.0 (a neutral multiplier) otherwise.

Overview

MetadataKeyScorer is an absolute scorer — it examines only the item’s metadata map. The allItems parameter is ignored. This scorer enables callers to boost items that carry a specific metadata signal by returning a multiplier value rather than an absolute score.

Multiplicative semantics: Unlike MetadataTrustScorer (which is an absolute passthrough returning a score directly), MetadataKeyScorer returns a multiplier value intended for use in a CompositeScorer. A match returns config.boost (e.g., 1.5); a non-match returns 1.0 (the neutral multiplier). Returned values are NOT clamped to [0.0, 1.0].

Fields Used

FieldSourcePurpose
metadataContextItemKey-value map; the scorer checks the configured key for the configured value

Metadata Namespace Reservation

All ContextItem.metadata keys with the "cupel:" prefix are reserved for library-defined conventions. Callers MUST NOT use this prefix for application-specific keys. This reservation is enforced at the spec level only — there is no runtime validation of key prefixes.

Conventions

cupel:priority

cupel:priority is a caller-provided priority signal for the item.

  • Key: "cupel:priority"
  • RECOMMENDED values: "high", "normal" (or "low")
  • Open string: implementations MUST NOT validate or reject unknown values. Callers MAY use any string value. Values are not validated at construction time or scoring time.
  • Typical usage: MetadataKeyScorer("cupel:priority", "high", 1.5) boosts high-priority items 1.5× in a composite scorer, leaving all other items at 1.0 (neutral multiplier).

Other Reserved Conventions

See MetadataTrustScorer for the cupel:trust and cupel:source-type conventions, which are the other reserved metadata keys in this library.

Configuration

Constructor Parameters

ParameterTypeConstraintDescription
keystring(none)The metadata key to match
valuestring(none)The exact value to match (string-to-string comparison)
boostfloat64> 0.0The multiplier returned for matching items

Validation

  • boost MUST be greater than 0.0.
  • A boost value of 0.0 MUST be rejected at construction time with CupelError::ScorerConfig (Rust) / ArgumentException (not ArgumentOutOfRangeException) (.NET).
  • A negative boost value MUST be rejected at construction time with CupelError::ScorerConfig (Rust) / ArgumentException (.NET).
  • A non-finite boost value (NaN or infinity) MUST be rejected at construction time with CupelError::ScorerConfig (Rust) / ArgumentException (.NET).
  • Validation occurs at construction time, not scoring time.

Algorithm

METADATA-KEY-SCORE(item, allItems, config):
    if item.metadata does not contain config.key:
        return config.defaultMultiplier  // 1.0

    raw <- item.metadata[config.key]
    if raw != config.value:
        return config.defaultMultiplier  // 1.0

    return config.boost

config.defaultMultiplier is a fixed constant of 1.0. It is NOT a constructor parameter — the neutral multiplier value is always 1.0. The scorer does NOT receive item.score as input; it returns a multiplier value used by the downstream scoring stage.

Score Interpretation

MetadataKeyScorer uses multiplicative semantics: it returns a multiplier, not an absolute score.

  • A matching item returns config.boost (e.g., 1.5).
  • A non-matching item returns 1.0 (neutral — has no effect when multiplied).
  • When used in a CompositeScorer, the composite multiplies the contributions of its child scorers. A MetadataKeyScorer with boost=1.5 effectively increases the composite weight for matching items by 1.5× relative to non-matching items.
  • Returned values are NOT clamped to [0.0, 1.0].

Edge Cases

ConditionResult
Key absent from item.metadata1.0 (default multiplier)
Key present, value matches config.valueconfig.boost
Key present, value does not match config.value1.0 (default multiplier)
boost = 0.0 at constructionConstruction error
boost < 0.0 at constructionConstruction error
boost is NaN or Infinity at constructionConstruction error

Conformance Vector Outlines

The following scenarios outline the expected behavior for conformance testing.

  1. Match → boost applied: An item with metadata["cupel:priority"] = "high", scorer configured with key="cupel:priority", value="high", boost=1.5 — the scorer returns 1.5.

  2. No-match → neutral: An item with metadata["cupel:priority"] = "normal", scorer configured with key="cupel:priority", value="high", boost=1.5 — the scorer returns 1.0.

  3. Missing key → neutral: An item with no cupel:priority key in its metadata, scorer configured for key="cupel:priority" — the scorer returns 1.0 regardless of boost.

  4. Zero boost → construction error: Constructing MetadataKeyScorer("cupel:priority", "high", 0.0) MUST fail with CupelError::ScorerConfig (Rust) / ArgumentException (.NET); the scorer MUST NOT be usable.

  5. Negative boost → construction error: Constructing MetadataKeyScorer("cupel:priority", "high", -1.0) MUST fail with CupelError::ScorerConfig (Rust) / ArgumentException (.NET); the scorer MUST NOT be usable.

Complexity

  • Time: O(1) per item (hash map lookup, string comparison).
  • Space: O(1) auxiliary.

Conformance Notes

  • Value comparison is string-to-string. The config.value is compared directly to the raw string stored at config.key in item.metadata. No parsing, normalization, or type coercion is applied.
  • In .NET, metadata values stored as non-string types SHOULD be converted to string for comparison. The comparison target is always config.value, which is a string.
  • config.defaultMultiplier is always 1.0 and is NOT a constructor parameter. It is a fixed constant of the scorer, not a configurable default.
  • boost validation occurs at construction time. Scoring-time inputs (key absence, value mismatch) never produce errors — they return 1.0.
  • The construction error type is ArgumentException in .NET — NOT ArgumentOutOfRangeException (D178).

Slicers

Slicers select a subset of scored items that fits within the token budget. They are invoked during Stage 5: Slice.

Slicer Interface

A slicer implements a single function:

Slice(scoredItems: list of ScoredItem, budget: ContextBudget) -> list of ContextItem

Contract

  1. Subset selection. The slicer MUST return a subset of the input items. It MUST NOT create, modify, or duplicate items.
  2. Budget compliance. The total tokens of selected items SHOULD NOT exceed budget.targetTokens. (Some strategies like KnapsackSlice may slightly under-fill due to discretization.)
  3. Zero-token items. Items with tokens = 0 are conventionally always included, as they consume no budget. Individual slicer implementations define this behavior.
  4. Empty input. If scoredItems is empty or budget.targetTokens <= 0, the slicer MUST return an empty list.

Slicer Summary

SlicerAlgorithmComplexityUse Case
GreedySliceValue-density greedy fillO(N log N)Fast, good-enough selection for most workloads
KnapsackSlice0/1 knapsack dynamic programmingO(N * C)Optimal selection when budget is tight
QuotaSlicePartitioned delegation by kindO(N + per-kind cost)Budget fairness across context kinds
CountQuotaSliceCount-based require/cap per kindO(N log N + inner cost)Absolute item-count guarantees per kind
CountConstrainedKnapsackSliceCount-require/cap + knapsack-optimal selectionO(N log N + N × C)Count guarantees with globally optimal token packing

Where N is the number of scored items and C is the discretized budget capacity.

StreamSlice (Implementation-Optional)

StreamSlice is an asynchronous/streaming slicer that processes items as they arrive rather than requiring the full sorted list upfront. It is documented here for completeness but is implementation-optional — the synchronous pipeline defined by this specification does not require streaming support.

Implementations that support asynchronous or streaming pipelines MAY provide a StreamSlice that conforms to the same behavioral contract (subset selection within budget) using a streaming evaluation model. The synchronous conformance test suite does not test StreamSlice.

GreedySlice

GreedySlice selects items by value density (score per token), greedily filling the budget from highest-density items first.

Overview

GreedySlice is the simplest and fastest slicer strategy. It computes a “value density” for each item — the ratio of relevance score to token cost — then iterates items in descending density order, including each item if it fits within the remaining budget. Zero-token items have infinite density and are always included.

This is the fractional knapsack heuristic applied to a 0/1 selection problem. It does not guarantee optimal selection (the KnapsackSlice provides that), but it is fast and produces good results in practice.

Algorithm

GREEDY-SLICE(scoredItems, budget):
    if length(scoredItems) = 0 or budget.targetTokens <= 0:
        return []

    // Step 1: Compute value densities
    densities <- new array of (density: float64, index: integer) of length(scoredItems)
    for i <- 0 to length(scoredItems) - 1:
        tokens <- scoredItems[i].item.tokens
        if tokens = 0:
            densities[i] <- (MAX_FLOAT, i)
        else:
            densities[i] <- (scoredItems[i].score / tokens, i)

    // Step 2: Sort by density descending, tiebreak by index ascending
    STABLE-SORT(densities, by density descending, then by index ascending)

    // Step 3: Greedy fill
    remaining <- budget.targetTokens
    selected <- empty list

    for i <- 0 to length(densities) - 1:
        originalIndex <- densities[i].index
        item <- scoredItems[originalIndex].item
        tokens <- item.tokens

        if tokens = 0:
            APPEND(selected, item)
        else if tokens <= remaining:
            APPEND(selected, item)
            remaining <- remaining - tokens

    return selected

Value Density

Value density measures how much relevance score an item provides per token consumed:

density = score / tokens
  • Items with tokens = 0 have density MAX_FLOAT (the largest representable finite double), ensuring they sort first and are always included. Among zero-token items, all share the same density MAX_FLOAT. The sort tiebreak is index only — score values are irrelevant for zero-token items.
  • Items with high scores and low token counts are preferred over items with moderate scores and high token counts.

Deterministic Tie-Break Contract

When two items have equal value density, the item with the lower original index in the input list MUST be preferred (original-index ascending). This is achieved either by using a stable sort or by including the original index as an explicit secondary sort key (density descending, then original index ascending).

This contract guarantees that GreedySlice produces identical output for identical inputs across repeated invocations. Budget-simulation methods (GetMarginalItems, FindMinBudgetFor) depend on this determinism to produce meaningful diff and binary-search results.

Among zero-token items — all sharing density MAX_FLOAT — score values MUST NOT affect relative order. The tiebreak is original index only.

Since the input is pre-sorted by score descending (from the Sort stage), this original-index tiebreaking preserves score order among equal-density items as a natural consequence.

Edge Cases

ConditionResult
Empty scoredItemsEmpty list
budget.targetTokens <= 0Empty list
All items are zero-tokenAll items selected
Single item fits within budgetThat item is selected
Single item exceeds budgetEmpty list (only zero-token items, if any)
All items exceed budget individuallyOnly zero-token items are selected
All items fit within budgetAll items selected

Complexity

  • Time: O(N log N) — dominated by the sort. The greedy fill pass is O(N).
  • Space: O(N) for the density array.

Conformance Notes

  • The density for zero-token items MUST be treated as the maximum possible value, ensuring they are always sorted first and always included.
  • Among zero-token items, tiebreaking is by original index only; score values MUST NOT affect the relative order of zero-token items.
  • The sort MUST use original-index-ascending tiebreaking for equal densities. Given items A (density 0.5, index 0) and B (density 0.5, index 3), A must precede B. This is the deterministic tie-break contract that budget simulation depends on.
  • Items that do not fit (tokens > remaining) are skipped, not deferred. The algorithm makes a single pass through the sorted densities — it does not backtrack or attempt smaller items after skipping a large one.
  • The output list order is determined by the greedy iteration order (density-descending), not the original input order. The Place stage handles final ordering.

KnapsackSlice

KnapsackSlice selects items using 0/1 knapsack dynamic programming, finding the combination of items that maximizes total score within the token budget.

Overview

KnapsackSlice provides optimal or near-optimal item selection by solving the classic 0/1 knapsack problem. Items are the “objects” with token counts as weights and relevance scores as values. The algorithm finds the subset with maximum total value that fits within the budget.

To make the DP feasible for large budgets, KnapsackSlice discretizes both item weights and budget capacity using a configurable bucket size. This trades precision for performance — the result is near-optimal rather than exactly optimal.

Algorithm

KNAPSACK-SLICE(scoredItems, budget, bucketSize):
    if length(scoredItems) = 0 or budget.targetTokens <= 0:
        return []

    // Step 1: Pre-filter zero-token items (always included)
    zeroTokenItems <- empty list
    candidates <- empty list
    for i <- 0 to length(scoredItems) - 1:
        if scoredItems[i].item.tokens = 0:
            APPEND(zeroTokenItems, scoredItems[i].item)
        else if scoredItems[i].item.tokens > 0:
            APPEND(candidates, scoredItems[i])

    if length(candidates) = 0:
        return zeroTokenItems

    n <- length(candidates)

    // Step 2: Build parallel arrays
    weights <- new array of integer[n]      // original token counts
    values  <- new array of integer[n]      // scaled integer scores
    items   <- new array of ContextItem[n]

    for i <- 0 to n - 1:
        weights[i] <- candidates[i].item.tokens
        values[i]  <- max(0, floor(candidates[i].score * 10000))
        items[i]   <- candidates[i].item

    // Step 3: Discretize capacity and weights
    capacity <- floor(budget.targetTokens / bucketSize)
    if capacity = 0:
        return zeroTokenItems

    discretizedWeights <- new array of integer[n]
    for i <- 0 to n - 1:
        discretizedWeights[i] <- ceil(weights[i] / bucketSize)

    // Step 4: DP with 1D value array + 2D keep table
    dp   <- new array of integer[capacity + 1], all zeros
    keep <- new 2D array of boolean[n][capacity + 1], all false

    for i <- 0 to n - 1:
        dw <- discretizedWeights[i]
        dv <- values[i]
        for w <- capacity downto dw:
            withItem <- dp[w - dw] + dv
            if withItem > dp[w]:
                dp[w] <- withItem
                keep[i][w] <- true

    // Step 5: Reconstruct solution
    selected <- empty list
    remainingCapacity <- capacity
    for i <- n - 1 downto 0:
        if keep[i][remainingCapacity]:
            APPEND(selected, items[i])
            remainingCapacity <- remainingCapacity - discretizedWeights[i]

    // Step 6: Combine zero-token items with selected items
    result <- empty list
    for i <- 0 to length(zeroTokenItems) - 1:
        APPEND(result, zeroTokenItems[i])
    for i <- 0 to length(selected) - 1:
        APPEND(result, selected[i])

    return result

Discretization

Score Scaling

Scores (float64 in [0.0, 1.0]) are converted to integers for the DP table:

integerValue = max(0, floor(score * 10000))

Implementation note: For non-negative scores (which all Cupel scores are, since scorers return values in [0.0, 1.0]), floor and truncation-toward-zero (C-style integer truncation) produce identical results. Either operation is conformant.

This provides 4 decimal digits of score precision. Items with very small positive scores (< 0.0001) are treated as value 0.

Weight Discretization

Item token counts and budget capacity are discretized using the bucket size:

  • Item weights use ceiling division: ceil(tokens / bucketSize). This ensures the DP never selects items that would exceed the original budget.
  • Budget capacity uses floor division: floor(targetTokens / bucketSize). This ensures the discretized capacity does not exceed the original budget.

The asymmetry (ceiling for weights, floor for capacity) is deliberate — it guarantees that any solution feasible in the discretized problem is also feasible in the original problem. The tradeoff is that the discretized problem may under-fill the budget.

Bucket Size

The bucketSize parameter (default: 100) controls the granularity of discretization:

  • Smaller bucket size = more precise but larger DP table (more memory and time).
  • Larger bucket size = faster but coarser approximation.
  • bucketSize MUST be a positive integer (> 0).

Edge Cases

ConditionResult
Empty scoredItemsEmpty list
budget.targetTokens <= 0Empty list
All items are zero-tokenAll items selected
capacity = 0 after discretizationOnly zero-token items
All items exceed discretized capacityOnly zero-token items
Single candidate fitsThat candidate plus zero-token items

Complexity

  • Time: O(N * C) where N is the number of non-zero-token candidates and C = floor(targetTokens / bucketSize).
  • Space: O(N * C) for the keep table. The 1D DP array uses O(C) additional space.

Precision Caveat (P5)

The score scaling factor (10000) and bucket size are implementation-defined parameters. This specification defines them as defaults for the reference implementation, but conforming implementations MAY use different values provided:

  1. The behavioral contract holds: the slicer returns a subset of items that fits within the budget.
  2. The selection is optimal or near-optimal for the discretization granularity used.
  3. Conformance tests use score differences of at least 0.01 (100 integer units at scale factor 10000) to avoid precision-dependent divergence.

Implementations that change the score scaling factor or bucket size SHOULD document the deviation.

Conformance Notes

  • Zero-token items MUST be pre-filtered and always included in the result. They do not participate in the DP.
  • Items with negative token counts are not expected in the input (they are excluded during classification), but if present, they SHOULD be excluded from the DP candidates.
  • The DP uses reverse iteration on the weight dimension (w <- capacity downto dw) to implement the standard 1D space optimization for 0/1 knapsack.
  • Solution reconstruction iterates candidates in reverse order (i <- n-1 downto 0), which is the standard backtracking direction for 0/1 knapsack with a keep table.
  • The output list order is: zero-token items first, then DP-selected items in reconstruction order. The Place stage handles final ordering.

QuotaSlice

QuotaSlice partitions items by ContextKind, distributes the token budget across kinds using configurable quotas, and delegates per-kind selection to an inner slicer.

Overview

QuotaSlice is a decorator slicer — it wraps another slicer (e.g., GreedySlice or KnapsackSlice) and adds budget fairness across context kinds. This ensures that no single kind dominates the context window, and that important kinds receive a guaranteed minimum allocation.

QuotaSlice operates in four phases:

  1. Partition items by ContextKind
  2. Compute candidate token mass per kind
  3. Distribute the budget across kinds (require/cap constraints)
  4. Delegate per-kind selection to the inner slicer

Configuration

QuotaSlice is configured with a QuotaSet that defines two constraints per kind:

  • Require(kind, minPercent) — Guarantees that at least minPercent% of the total budget is reserved for this kind. Range: [0.0, 100.0].
  • Cap(kind, maxPercent) — Limits this kind to at most maxPercent% of the total budget. Range: [0.0, 100.0].

Validation Rules

  1. For each kind: require <= cap.
  2. The sum of all require percentages MUST NOT exceed 100%.
  3. Kinds without explicit configuration default to require = 0% and cap = 100%.

Algorithm

QUOTA-SLICE(scoredItems, budget, quotas, innerSlicer):
    if length(scoredItems) = 0 or budget.targetTokens <= 0:
        return []

    targetTokens <- budget.targetTokens

    // Phase 1: Partition by ContextKind
    partitions <- empty map of ContextKind -> list of ScoredItem
    for i <- 0 to length(scoredItems) - 1:
        kind <- scoredItems[i].item.kind    // case-insensitive
        if kind is not in partitions:
            partitions[kind] <- empty list
        APPEND(partitions[kind], scoredItems[i])

    // Phase 2: Candidate token mass per kind
    candidateTokenMass <- empty map of ContextKind -> integer
    for each (kind, items) in partitions:
        mass <- 0
        for i <- 0 to length(items) - 1:
            mass <- mass + items[i].item.tokens
        candidateTokenMass[kind] <- mass

    // Phase 3: Budget distribution
    kindBudgets <- DISTRIBUTE-BUDGET(partitions, candidateTokenMass,
                                      targetTokens, quotas)

    // Phase 4: Per-kind slicing
    allSelected <- empty list
    for each (kind, items) in partitions:
        kindBudget <- kindBudgets[kind]     // 0 if not present
        if kindBudget <= 0:
            continue
        cap <- floor(quotas.getCap(kind) / 100.0 * targetTokens)
        subBudget <- ContextBudget(maxTokens: cap, targetTokens: kindBudget)
        selected <- innerSlicer.Slice(items, subBudget)
        for i <- 0 to length(selected) - 1:
            APPEND(allSelected, selected[i])

    return allSelected

Budget Distribution

DISTRIBUTE-BUDGET(partitions, candidateTokenMass, targetTokens, quotas):
    // Step 1: Compute require and cap token amounts
    requireTokens <- empty map of ContextKind -> integer
    capTokens     <- empty map of ContextKind -> integer

    for each kind in quotas.configuredKinds:
        requireTokens[kind] <- floor(quotas.getRequire(kind) / 100.0 * targetTokens)
        capTokens[kind]     <- floor(quotas.getCap(kind) / 100.0 * targetTokens)

    // Step 2: Sum required tokens
    totalRequired <- 0
    for each (kind, req) in requireTokens:
        totalRequired <- totalRequired + req

    unassignedBudget <- max(0, targetTokens - totalRequired)

    // Step 3: Compute distribution mass (kinds that can receive beyond their require)
    totalMassForDistribution <- 0
    for each (kind, items) in partitions:
        cap <- capTokens[kind] if kind in capTokens, else targetTokens
        require <- requireTokens[kind] if kind in requireTokens, else 0
        if cap > require:
            totalMassForDistribution <- totalMassForDistribution + candidateTokenMass[kind]

    // Step 4: Distribute per kind
    kindBudgets <- empty map of ContextKind -> integer
    for each (kind, items) in partitions:
        require <- requireTokens[kind] if kind in requireTokens, else 0
        cap <- capTokens[kind] if kind in capTokens, else targetTokens

        proportional <- 0
        if totalMassForDistribution > 0 and cap > require:
            proportional <- floor(unassignedBudget * candidateTokenMass[kind]
                                   / totalMassForDistribution)

        kindBudget <- require + proportional
        if kindBudget > cap:
            kindBudget <- cap

        kindBudgets[kind] <- kindBudget

    return kindBudgets

Budget Distribution Rounding (P6)

All percentage-to-token conversions use floor truncation (integer truncation toward zero):

requireTokens = floor(requirePercent / 100.0 * targetTokens)
capTokens     = floor(capPercent / 100.0 * targetTokens)
proportional  = floor(unassigned * mass / totalMass)

Because of this floor truncation, the sum of all kind budgets may be less than targetTokens. This is by design — it ensures that no kind receives more tokens than its allocation warrants, at the cost of potentially leaving a small number of tokens unallocated.

Example

Given targetTokens = 1000 and two kinds each with require = 33%:

requireTokens["A"] = floor(0.33 * 1000) = 330
requireTokens["B"] = floor(0.33 * 1000) = 330
totalRequired = 660
// 340 tokens remain for proportional distribution
// Even after distribution, sum of kind budgets may be < 1000

ContextKind Comparison (P4)

All ContextKind comparisons in QuotaSlice — partition key grouping, require/cap lookups, and kind budget assignment — MUST use case-insensitive ASCII case folding, consistent with the ContextKind comparison semantics.

Edge Cases

ConditionResult
Empty scoredItemsEmpty list
budget.targetTokens <= 0Empty list
No quotas configuredAll kinds get proportional budget (no require/cap constraints)
Kind has require = 100%That kind gets the entire budget; all other kinds get 0
Kind has cap = 0%That kind is excluded (budget = 0)
Kind has candidates but zero token massGets its require allocation (if any), no proportional share
Kind has no candidatesNot present in partitions; its require allocation is unused

Complexity

  • Time: O(N) for partitioning + O(K) for budget distribution + sum of inner slicer costs per partition, where K is the number of distinct kinds.
  • Space: O(N) for partitioned item lists + O(K) for budget maps.

Conformance Notes

  • The inner slicer receives a ContextBudget with maxTokens set to the kind’s cap (in tokens) and targetTokens set to the kind’s computed budget. Other budget fields are not forwarded.
  • ContextKind comparison MUST be case-insensitive throughout (partitioning, quota lookups, budget assignment).
  • Kinds not present in the quota configuration default to require = 0%, cap = 100%. They participate in proportional distribution with no floor guarantee and no ceiling (other than the full budget).
  • The proportional distribution uses candidateTokenMass (sum of token counts for actual candidates in each kind), not the theoretical maximum. This means kinds with fewer/smaller candidates naturally receive less budget.
  • The output list order is: items from each kind partition concatenated in partition iteration order. The Place stage handles final ordering.

CountQuotaSlice

CountQuotaSlice enforces absolute minimum and maximum item counts per ContextKind using a two-phase COUNT-DISTRIBUTE-BUDGET algorithm, then delegates residual selection to an inner slicer.

Overview

CountQuotaSlice is a decorator slicer — it wraps another slicer (e.g., GreedySlice) and adds count-based fairness across context kinds. Where QuotaSlice distributes a token budget as percentages across kinds, CountQuotaSlice enforces absolute item counts: “at least 2 tool items, at most 5 tool items.”

CountQuotaSlice operates in three phases:

  1. Count-Satisfy (Phase 1): For each kind with a requireCount > 0, commit the top-N candidates (by score descending) to the selection. Their token cost is accumulated as preAllocatedTokens and the committed items are removed from the residual candidate pool.
  2. Budget-Distribute (Phase 2): The inner slicer receives the residual candidate pool and a reduced budget (targetTokens reduced by preAllocatedTokens).
  3. Cap Enforcement (Phase 3): Items returned by the inner slicer are filtered against the per-kind capCount. Items that would exceed the cap are excluded.

Configuration

CountQuotaSlice is configured with a list of CountQuotaEntry values and a ScarcityBehavior:

CountQuotaEntry

Each entry specifies constraints for one ContextKind:

FieldTypeDescription
kindContextKindThe kind this entry constrains
requireCountinteger ≥ 0Minimum number of items of this kind to commit in Phase 1
capCountinteger > 0 (or 0 when requireCount is also 0)Maximum number of items of this kind in the final result

Validation Rules

  1. requireCount MUST be ≤ capCount.
  2. capCount = 0 with requireCount > 0 is rejected — a zero cap with a positive requirement can never be satisfied.
  3. Kinds without an entry have no require or cap constraint — they participate freely in Phase 2 delegation.

ScarcityBehavior

Controls what happens when the candidate pool has fewer items of a kind than the configured requireCount:

ValueBehavior
Degrade (default)Include all available candidates and record a shortfall. Pipeline execution continues.
ThrowReturn an error / throw an exception immediately. Use when count requirements are hard guarantees (e.g., required disclaimer text).

KnapsackSlice Guard

CountQuotaSlice does not support KnapsackSlice as the inner slicer. Construction MUST fail with an error if the inner slicer is a KnapsackSlice instance. Use GreedySlice as the inner slicer instead. A CountConstrainedKnapsackSlice may be provided in a future release.

Algorithm

COUNT-QUOTA-SLICE(scoredItems, budget, entries, innerSlicer, scarcity):
    if length(scoredItems) = 0 or budget.targetTokens <= 0:
        return []

    // Build policy lookup: kind -> (requireCount, capCount)
    entryByKind <- map from entries

    // --- Phase 1: Count-Satisfy ---

    // Partition candidates by ContextKind
    partitions <- group scoredItems by item.kind (case-insensitive)

    // Sort each partition by score descending
    for each (kind, candidates) in partitions:
        SORT(candidates, by score descending)

    committed     <- empty list
    committedSet  <- empty set (reference equality)
    selectedCount <- empty map of ContextKind -> integer
    preAllocatedTokens <- 0
    shortfalls    <- empty list

    for each entry in entries where entry.requireCount > 0:
        candidates <- partitions[entry.kind] (empty if absent)
        satisfied  <- 0

        for each candidate in candidates while satisfied < entry.requireCount:
            APPEND(committed, candidate.item)
            ADD(committedSet, candidate.item)
            preAllocatedTokens <- preAllocatedTokens + candidate.item.tokens
            satisfied <- satisfied + 1

        selectedCount[entry.kind] <- satisfied

        if satisfied < entry.requireCount:
            if scarcity = Throw:
                ERROR("kind has fewer candidates than requireCount")
            else:
                RECORD-SHORTFALL(entry.kind, entry.requireCount, satisfied)

    // --- Phase 2: Budget-Distribute ---

    residual <- [si in scoredItems where si.item not in committedSet]
    residualTarget <- max(0, budget.targetTokens - preAllocatedTokens)
    residualBudget <- ContextBudget(maxTokens: budget.maxTokens,
                                     targetTokens: min(residualTarget, budget.maxTokens))

    innerSelected <- innerSlicer.Slice(residual, residualBudget)

    // --- Phase 3: Cap Enforcement ---

    result <- committed

    for each item in innerSelected:
        kind <- item.kind
        count <- selectedCount[kind] (default 0)
        cap   <- entryByKind[kind].capCount (uncapped if no entry)

        if cap is defined and count >= cap:
            EXCLUDE(item)    // cap exceeded
        else:
            APPEND(result, item)
            selectedCount[kind] <- count + 1

    return result

Scarcity Reporting

When ScarcityBehavior.Degrade is active and a kind’s candidate pool cannot satisfy its requireCount, a CountRequirementShortfall is recorded:

FieldTypeDescription
kindContextKind (or string)The kind that was short
requiredCountintegerThe configured requireCount
satisfiedCountintegerHow many candidates were actually committed

In .NET, shortfalls are available via CountQuotaSlice.LastShortfalls (populated after each Slice call). In Rust, shortfalls are recorded on SelectionReport.count_requirement_shortfalls (populated by the pipeline after slicing).

Monotonicity

CountQuotaSlice does not guarantee monotonic item inclusion as the budget changes. Phase 1 commits a fixed set of items regardless of budget, but Phase 2 delegation produces different residual selections at different budgets. Combined with the cap enforcement in Phase 3, the final set can exhibit non-monotonic inclusion.

This means CountQuotaSlice is incompatible with budget simulation methods that assume monotonic inclusion:

  • GetMarginalItems — does not guard against CountQuotaSlice (only guards against QuotaSlice), because Phase 1 pre-allocation is budget-independent and the inner slicer controls the residual. However, the cap enforcement can cause non-monotonic behavior.
  • FindMinBudgetFor — MUST throw if the pipeline’s slicer is CountQuotaSlice (see FindMinBudgetFor QuotaSlice + CountQuotaSlice Guard).

Edge Cases

ConditionResult
Empty scoredItemsEmpty list
budget.targetTokens <= 0Empty list
No entries configuredPhase 1 commits nothing; Phase 2 runs unconstrained
requireCount = capCountPhase 1 commits exactly requireCount items; Phase 2 cannot add more of that kind
requireCount = 0, capCount > 0Phase 1 commits nothing for that kind; Phase 2 output is capped
Kind has fewer candidates than requireCountScarcity behavior applies (Degrade or Throw)
Phase 1 exhausts the token budgetPhase 2 receives residualTarget = 0 and selects nothing
Inner slicer is KnapsackSliceConstruction fails with error

Complexity

  • Time: O(N) for partitioning + O(K · M log M) for per-kind sorting in Phase 1 (where M is the largest partition size) + inner slicer cost for Phase 2 + O(I) for Phase 3 cap enforcement (where I is the inner slicer output size).
  • Space: O(N) for partitioned item lists + O(K) for policy maps and count tracking.

Conformance Notes

  • ContextKind comparison MUST be case-insensitive throughout (partitioning and policy lookups), consistent with the ContextKind comparison semantics.
  • Phase 1 commits items in score-descending order within each kind. The input is pre-sorted by the pipeline, but implementations MUST sort per-kind partitions explicitly to ensure correctness when multiple kinds interleave.
  • The inner slicer in Phase 2 receives a ContextBudget with targetTokens reduced by Phase 1 pre-allocation. maxTokens is passed through unchanged from the original budget.
  • Phase 3 cap enforcement applies only to items returned by the inner slicer in Phase 2. Phase 1 committed items are always included (they count toward the cap but are never excluded by it).
  • Kinds not present in the entry list have no require or cap constraint. They are not partitioned in Phase 1 and pass through Phase 2 and 3 without cap filtering.
  • The KnapsackSlice guard MUST be enforced at construction time, not at slice time.

CountConstrainedKnapsackSlice

CountConstrainedKnapsackSlice enforces absolute minimum and maximum item counts per ContextKind while using knapsack-optimal selection for the residual budget, implementing a 3-phase COUNT-KNAPSACK-CAP algorithm.

Overview

CountConstrainedKnapsackSlice is a decorator slicer — it wraps a KnapsackSlice directly and adds count-based fairness across context kinds. Unlike CountQuotaSlice, which delegates to a configurable inner slicer and guards against KnapsackSlice at construction time, CountConstrainedKnapsackSlice hardwires KnapsackSlice as the Phase 2 engine — it is the knapsack wrapper itself.

CountConstrainedKnapsackSlice operates in three phases:

  1. Count-Satisfy (Phase 1): Identical to CountQuotaSlice Phase 1. For each kind with a requireCount > 0, commit the top-N candidates (by score descending) to the selection. Their token cost is accumulated as preAllocatedTokens and the committed items are removed from the residual candidate pool.
  2. Knapsack-Distribute (Phase 2): The residual candidate pool is passed to the stored KnapsackSlice with a reduced budget (targetTokens reduced by preAllocatedTokens). The Phase 2 output is then re-sorted by score descending before Phase 3 proceeds (D180 — required because KnapsackSlice output order is not guaranteed to be score-descending).
  3. Cap Enforcement (Phase 3): Items returned by Phase 2 (after re-sort) are filtered against the per-kind capCount. Items that would exceed the cap are excluded. The selectedCount map is seeded from Phase 1 committed counts (D181 — not initialized to zero), ensuring cap tracking is correct across both phases.

Configuration

CountConstrainedKnapsackSlice is configured with a list of CountQuotaEntry values and a ScarcityBehavior:

CountQuotaEntry

Each entry specifies constraints for one ContextKind:

FieldTypeDescription
kindContextKindThe kind this entry constrains
requireCountinteger ≥ 0Minimum number of items of this kind to commit in Phase 1
capCountinteger > 0 (or 0 when requireCount is also 0)Maximum number of items of this kind in the final result

ScarcityBehavior

Controls what happens when the candidate pool has fewer items of a kind than the configured requireCount:

ValueBehavior
Degrade (default)Include all available candidates and record a shortfall. Pipeline execution continues.
ThrowReturn an error / throw an exception immediately. Use when count requirements are hard guarantees (e.g., required disclaimer text).

Construction Parameters

Rust: CountConstrainedKnapsackSlice::new(Vec<CountQuotaEntry>, KnapsackSlice, ScarcityBehavior)

.NET: CountConstrainedKnapsackSlice(IReadOnlyList<CountQuotaEntry>, KnapsackSlice, ScarcityBehavior = ScarcityBehavior.Degrade)

Validation Rules

  1. requireCount MUST be ≤ capCount.
  2. capCount = 0 with requireCount > 0 is rejected — a zero cap with a positive requirement can never be satisfied.
  3. Kinds without an entry have no require or cap constraint — they participate freely in Phase 2 delegation.

Algorithm

COUNT-KNAPSACK-CAP(scoredItems, budget, entries, knapsack, scarcity):
    if length(scoredItems) = 0 or budget.targetTokens <= 0:
        return []

    // Build policy lookup: kind -> (requireCount, capCount)
    entryByKind <- map from entries

    // --- Phase 1: Count-Satisfy (identical to CountQuotaSlice Phase 1) ---

    // Partition candidates by ContextKind
    partitions <- group scoredItems by item.kind (case-insensitive)

    // Sort each partition by score descending
    for each (kind, candidates) in partitions:
        SORT(candidates, by score descending)

    committed     <- empty list
    committedSet  <- empty set (reference equality)
    selectedCount <- empty map of ContextKind -> integer
    preAllocatedTokens <- 0
    shortfalls    <- empty list

    for each entry in entries where entry.requireCount > 0:
        candidates <- partitions[entry.kind] (empty if absent)
        satisfied  <- 0

        for each candidate in candidates while satisfied < entry.requireCount:
            APPEND(committed, candidate.item)
            ADD(committedSet, candidate.item)
            preAllocatedTokens <- preAllocatedTokens + candidate.item.tokens
            satisfied <- satisfied + 1

        selectedCount[entry.kind] <- satisfied    // D181: seed Phase 3 starting counts

        if satisfied < entry.requireCount:
            if scarcity = Throw:
                ERROR("CountConstrainedKnapsackSlice: candidate pool for kind '<kind>' has <satisfied> items but RequireCount is <requireCount>.")
            else:
                RECORD-SHORTFALL(entry.kind, entry.requireCount, satisfied)

    // --- Phase 2: Knapsack-Distribute ---

    residual <- [si in scoredItems where si.item not in committedSet]
    residualTarget <- max(0, budget.targetTokens - preAllocatedTokens)
    residualBudget <- ContextBudget(maxTokens: budget.maxTokens,
                                     targetTokens: min(residualTarget, budget.maxTokens))

    innerSelected <- KnapsackSlice(residual, residualBudget)

    // D180: Re-sort Phase 2 output by score descending before Phase 3 cap loop.
    // KnapsackSlice output order is not guaranteed; cap enforcement must see
    // highest-scoring items first so cap budget is spent on the best items.
    innerSelected <- SORT(innerSelected, by score descending)

    // --- Phase 3: Cap Enforcement ---

    result <- committed

    // selectedCount is already seeded from Phase 1 (D181 — not reset to zero)
    for each item in innerSelected:
        kind  <- item.kind
        count <- selectedCount[kind] (default 0)
        cap   <- entryByKind[kind].capCount (uncapped if no entry)

        if cap is defined and count >= cap:
            EXCLUDE(item)    // cap exceeded
        else:
            APPEND(result, item)
            selectedCount[kind] <- count + 1

    return result

Scarcity Reporting

When ScarcityBehavior.Degrade is active and a kind’s candidate pool cannot satisfy its requireCount, a CountRequirementShortfall is recorded:

FieldTypeDescription
kindContextKind (or string)The kind that was short
requiredCountintegerThe configured requireCount
satisfiedCountintegerHow many candidates were actually committed

The exact ScarcityBehavior.Throw error message is:

"CountConstrainedKnapsackSlice: candidate pool for kind '<kind>' has <satisfied> items but RequireCount is <requireCount>."

In .NET, shortfalls are available via CountConstrainedKnapsackSlice.LastShortfalls (populated after each Slice call). In Rust, shortfalls are recorded on SelectionReport.count_requirement_shortfalls (populated by the pipeline after slicing).

Monotonicity

CountConstrainedKnapsackSlice returns is_count_quota() → true. This signals that the slicer is incompatible with budget simulation methods that require monotonic item inclusion.

Note: Unlike CountQuotaSlice, there is no KnapsackSlice guard at construction time. CountConstrainedKnapsackSlice is the knapsack wrapper itself — it hardwires KnapsackSlice internally and does not accept it as a configurable inner slicer.

Trade-offs

Pre-processing Sub-optimality (D174)

Phase 1 commits required items before the knapsack runs, consuming budget. If required items are token-heavy, the residual budget available to Phase 2 may be significantly reduced — potentially to zero. In the worst case, globally optimal selection is impossible because the knapsack cannot see the full candidate pool with the full budget. This is the inherent trade-off of the pre-processing approach (Path 5A): it avoids a full constrained knapsack DP (which would be NP-hard to extend with per-kind minimums) at the cost of potential sub-optimality when Phase 1 pre-allocation is large relative to the total budget.

Cap Waste

KnapsackSlice may select items in Phase 2 that Phase 3 then excludes due to cap enforcement. Token budget was “spent” on those items in the knapsack DP computation — they counted toward capacity — but they do not appear in the final result. In scenarios with a low cap and many candidates of a capped kind, Phase 2 may select several items that are subsequently excluded by Phase 3, reducing effective throughput.

Edge Cases

ConditionResult
Empty scoredItemsEmpty list
budget.targetTokens <= 0Empty list
No entries configuredPhase 1 commits nothing; Phase 2 runs unconstrained knapsack
requireCount = capCountPhase 1 commits exactly requireCount items; Phase 3 cannot add more of that kind
requireCount = 0, capCount > 0Phase 1 commits nothing for that kind; Phase 2 output is capped
Kind has fewer candidates than requireCountScarcity behavior applies (Degrade or Throw)
Phase 1 exhausts budgetPhase 2 receives residualTarget = 0 and selects nothing
No KnapsackSlice guardN/A — CountConstrainedKnapsackSlice hardwires KnapsackSlice internally; no guard is needed at construction
KnapsackSlice OOM guardCupelError::TableTooLarge propagates from KnapsackSlice when the DP table exceeds 50 M cells

Complexity

  • Time: O(N) for partitioning + O(K · M log M) for per-kind sorting in Phase 1 (where M is the largest partition size) + O(N × C) for Phase 2 KnapsackSlice DP (where C = residual capacity / bucket_size) + O(I log I) for Phase 2 re-sort + O(I) for Phase 3 cap enforcement (where I is the KnapsackSlice output size).
  • Space: O(N) for partitioned item lists + O(K) for policy maps and count tracking + O(N × C) for the knapsack DP table.

Conformance Notes

  • ContextKind comparison MUST be case-insensitive throughout (partitioning and policy lookups), consistent with the ContextKind comparison semantics.
  • Phase 2 re-sort is normative (D180): Implementations MUST sort Phase 2 output by score descending before entering the Phase 3 cap loop. KnapsackSlice output order is undefined; without re-sorting, cap enforcement may exclude higher-scoring items and include lower-scoring ones.
  • Phase 3 MUST initialize selectedCount from Phase 1 committed counts (D181): Implementations MUST NOT reset selectedCount to zero before Phase 3. The Phase 1 committed counts are the correct starting state for cap tracking across both phases.
  • Phase 3 cap enforcement applies only to items returned by Phase 2. Phase 1 committed items are always included (they count toward the cap but are never excluded by it).
  • Kinds not present in the entry list have no require or cap constraint. They are not partitioned in Phase 1 and pass through Phase 2 and Phase 3 without cap filtering.

Conformance Vector Outlines

1. Baseline — Require 2 of kind, knapsack selects residual

File: count-constrained-knapsack-baseline.toml

  • Budget: 1000 tokens. 3 items, each 100 tokens.
  • Policy: require_count=2, cap_count=4 for kind "tool". bucket_size=100.
  • Phase 1: commits tool-a (0.9) and tool-b (0.7). preAllocatedTokens=200. selectedCount["tool"]=2.
  • Phase 2: KnapsackSlice receives residual [msg-x (0.5, 100t)] with budget 800. Selects msg-x.
  • Phase 3: selectedCount["tool"]=2, cap=4. All Phase 2 items pass (kind "msg" is uncapped).
  • Expected: all 3 items selected. 0 shortfalls. 0 cap exclusions.

2. Cap Exclusion — cap=2, 4 candidates, 2 excluded by cap after Phase 2 re-sort

File: count-constrained-knapsack-cap-exclusion.toml

  • Budget: 600 tokens. 4 tool items, each 100 tokens.
  • Policy: require_count=1, cap_count=2 for kind "tool". bucket_size=100.
  • Phase 1: commits tool-a (0.9). preAllocatedTokens=100. selectedCount["tool"]=1.
  • Phase 2: KnapsackSlice receives [tool-b (0.8), tool-c (0.7), tool-d (0.6)] with budget 500. All fit — selects all 3. Re-sort: [tool-b, tool-c, tool-d] (already score-descending).
  • Phase 3: selectedCount["tool"]=1, cap=2. tool-b → count=2, pass. tool-c → count=2 ≥ cap=2, excluded. tool-d → excluded.
  • Expected: [tool-a, tool-b]. 0 shortfalls. 2 cap exclusions.

3. Scarcity Degrade — require_count=3, only 1 candidate, shortfall recorded

File: count-constrained-knapsack-scarcity-degrade.toml

  • Budget: 500 tokens. 1 item: tool-a (0.9, 100t).
  • Policy: require_count=3, cap_count=5 for kind "tool". ScarcityBehavior=Degrade.
  • Phase 1: commits tool-a. satisfied=1 < requireCount=3 → shortfall recorded {kind:"tool", requiredCount:3, satisfiedCount:1}. selectedCount["tool"]=1.
  • Phase 2: residual empty. KnapsackSlice selects nothing.
  • Phase 3: no Phase 2 items.
  • Expected: [tool-a]. 1 shortfall. 0 cap exclusions.

4. Tag Non-exclusivity — two kinds with independent require constraints

File: count-constrained-knapsack-tag-nonexclusive.toml

  • Budget: 1000 tokens. 3 items: item-tool (tool, 0.9, 100t), item-memory (memory, 0.8, 100t), item-extra (tool, 0.5, 100t).
  • Policy: require_count=1, cap_count=4 for "tool" and "memory" independently. bucket_size=100.
  • Phase 1: commits item-tool (satisfies tool), item-memory (satisfies memory). preAllocatedTokens=200. selectedCount={"tool":1, "memory":1}.
  • Phase 2: KnapsackSlice receives [item-extra (0.5, 100t)] with budget 800. Selects item-extra.
  • Phase 3: item-extra is kind "tool", count=1 < cap=4 → pass. selectedCount["tool"]=2.
  • Expected: all 3 items selected. 0 shortfalls. 0 cap exclusions.

5. Require-and-Cap — require=2, cap=2, knapsack picks best msg items from residual

File: count-constrained-knapsack-require-and-cap.toml

  • Budget: 1000 tokens. bucket_size=1 (exact knapsack). 5 items: tool-a (tool, 0.9, 100t), tool-b (tool, 0.7, 100t), msg-s (msg, 0.8, 50t), msg-m (msg, 0.6, 150t), msg-l (msg, 0.4, 200t).
  • Policy: require_count=2, cap_count=2 for "tool". No constraint on "msg".
  • Phase 1: commits tool-a and tool-b. preAllocatedTokens=200. selectedCount["tool"]=2.
  • Phase 2: KnapsackSlice receives [msg-s, msg-m, msg-l] with budget 800. All three fit (50+150+200=400 < 800). Selects all 3.
  • Phase 3: selectedCount["tool"]=2, cap=2. Phase 2 items are kind "msg" (uncapped) → all pass.
  • Expected: all 5 items selected. 0 shortfalls. 0 cap exclusions.

Placers

Placers determine the final ordering of selected items in the context window. They are invoked during Stage 6: Place.

Placer Interface

A placer implements a single function:

Place(items: list of ScoredItem) -> list of ContextItem
  • items is the merged list of ScoredItem pairs — pinned items (with score 1.0) followed by sliced items (with their computed scores). See Stage 6: Place for the merge process.
  • The return value is the final ordered list of ContextItem instances.

Contract

  1. Reorder only. The placer MUST return the same items it receives, in a potentially different order. It MUST NOT add, remove, modify, or duplicate items.
  2. Deterministic. Given the same input, the placer MUST produce the same output order.
  3. Score-aware. The placer receives scores (via ScoredItem) and may use them to determine ordering. Pinned items have score 1.0.

Placer Summary

PlacerAlgorithmUse Case
ChronologicalPlacerStable sort ascending by timestampNatural conversation flow; preserves temporal ordering
UShapedPlacerHighest scores at edges, lowest in middleExploits LLM primacy + recency attention bias

ChronologicalPlacer

The ChronologicalPlacer orders items by ascending timestamp, preserving natural temporal flow.

Overview

ChronologicalPlacer is the simplest placer strategy. It sorts items by their timestamp field in ascending order (oldest first), producing a context window that reads in chronological order. This is the natural ordering for conversation-style contexts where temporal sequence matters.

Fields Used

FieldSourcePurpose
timestampContextItemSort key for temporal ordering
scoreScoredItemNot used by this placer

Algorithm

CHRONOLOGICAL-PLACE(items):
    if length(items) <= 1:
        if length(items) = 0:
            return []
        return [items[0].item]

    // Build sortable array with original indices
    sortable <- new array of (timestamp, index) of length(items)
    for i <- 0 to length(items) - 1:
        sortable[i] <- (items[i].item.timestamp, i)

    // Stable sort: timestamped items ascending, then null-timestamp items, then by index
    STABLE-SORT(sortable, by:
        1. Items with timestamps sort before items without timestamps
        2. Among timestamped items: ascending by timestamp
        3. Tiebreak: ascending by original index
    )

    result <- new array of ContextItem[length(items)]
    for i <- 0 to length(sortable) - 1:
        result[i] <- items[sortable[i].index].item

    return result

Sort Order Detail

The sort comparison between two items a and b:

CHRONOLOGICAL-COMPARE(a, b):
    aHas <- a.timestamp is not null
    bHas <- b.timestamp is not null

    if aHas and bHas:
        if a.timestamp != b.timestamp:
            return a.timestamp < b.timestamp    // ascending
        return a.index < b.index                // tiebreak

    if aHas:
        return true     // a (with timestamp) sorts before b (without)
    if bHas:
        return false    // b (with timestamp) sorts before a (without)

    return a.index < b.index    // both null: preserve original order

Timestamp Placement Rules

  1. Timestamped items sort first, in ascending temporal order.
  2. Null-timestamp items sort last, in original input order.
  3. Tied timestamps preserve original input order (stable sort).

This means the context window reads oldest-to-newest for timestamped items, with non-timestamped items appended at the end.

Edge Cases

ConditionResult
Empty inputEmpty list
Single itemThat item returned as-is
All items have timestampsSorted ascending by timestamp
No items have timestampsOriginal input order preserved
Mixed timestamped and non-timestampedTimestamped items first (ascending), then non-timestamped (original order)
All items have the same timestampOriginal input order preserved (stable sort)

Complexity

  • Time: O(N log N) — dominated by the sort.
  • Space: O(N) for the sortable array.

Conformance Notes

  • Timestamp comparison is by temporal ordering (the underlying UTC instant), not by string representation or timezone.
  • Items without timestamps MUST sort after all timestamped items, not before.
  • The sort MUST be stable. When timestamps are equal (or both null), the original input order (index) MUST be preserved.
  • Scores are available (items are ScoredItem pairs) but MUST NOT influence the ordering. ChronologicalPlacer is purely timestamp-based.

UShapedPlacer

The UShapedPlacer positions the highest-scored items at both edges of the context window (start and end), with the lowest-scored items in the middle.

Overview

UShapedPlacer exploits the primacy and recency bias observed in large language models — LLMs tend to pay more attention to content at the beginning and end of the context window, with less attention to content in the middle. By placing the most relevant items at both edges, UShapedPlacer maximizes the chance that important context is attended to.

The placement pattern forms a “U” shape when score is plotted against position: high at the edges, low in the center.

Fields Used

FieldSourcePurpose
scoreScoredItemDetermines placement priority

Algorithm

U-SHAPED-PLACE(items):
    if length(items) <= 1:
        if length(items) = 0:
            return []
        return [items[0].item]

    // Step 1: Sort by score descending, tiebreak by index ascending
    scored <- new array of (score, index) of length(items)
    for i <- 0 to length(items) - 1:
        scored[i] <- (items[i].score, i)

    STABLE-SORT(scored, by score descending, then by index ascending)

    // Step 2: Alternate placement — even ranks to left edge, odd ranks to right edge
    result <- new array of ContextItem[length(items)]
    left  <- 0                      // next available left position
    right <- length(items) - 1      // next available right position

    for i <- 0 to length(scored) - 1:
        originalIndex <- scored[i].index
        item <- items[originalIndex].item

        if i mod 2 = 0:             // even rank (0, 2, 4, ...)
            result[left] <- item
            left <- left + 1
        else:                       // odd rank (1, 3, 5, ...)
            result[right] <- item
            right <- right - 1

    return result

Placement Pattern

Items are sorted by score descending and assigned a rank (0, 1, 2, …). The rank determines placement:

RankPositionEdge
0 (highest score)Left edge (index 0)Start
1 (2nd highest)Right edge (index N-1)End
2 (3rd highest)Left edge (index 1)Start
3 (4th highest)Right edge (index N-2)End
4 (5th highest)Left edge (index 2)Start

Visual Example

Given 7 items sorted by score [A=0.9, B=0.8, C=0.7, D=0.6, E=0.5, F=0.4, G=0.3]:

Position:  [  0  ] [  1  ] [  2  ] [  3  ] [  4  ] [  5  ] [  6  ]
Item:      [  A  ] [  C  ] [  E  ] [  G  ] [  F  ] [  D  ] [  B  ]
Score:     [ 0.9 ] [ 0.7 ] [ 0.5 ] [ 0.3 ] [ 0.4 ] [ 0.6 ] [ 0.8 ]

Score plot:  ___                                               ___
            |   \                                           /  |
            |    \___                                 ___/     |
            |        \___                       ___/           |
            |            \___             ___/                 |
            |                \___   ___/                       |
            |                    \_/                           |
           start                middle                       end

The highest scores (A, B) are at the edges. The lowest score (G) is in the center.

Sort Stability

The sort MUST be stable or use a composite sort key. When two items have the same score, the item with the lower original index is ranked first. This ensures deterministic placement when scores are tied.

Edge Cases

ConditionResult
Empty inputEmpty list
Single itemThat item returned as-is
Two itemsHigher-scored at index 0, lower-scored at index 1
All items have the same scoreOriginal input order preserved (stable sort); alternating placement still applies
Pinned items (score 1.0)Not special-cased by UShapedPlacer; pinned items arrive with score 1.0 from the pipeline and naturally rank highly, sorting to edges.

Complexity

  • Time: O(N log N) — dominated by the sort.
  • Space: O(N) for the scored array and result array.

Conformance Notes

  • The sort is descending by score, with ascending index as the tiebreaker.
  • Even-ranked items (0, 2, 4, …) fill from the left (start of the array).
  • Odd-ranked items (1, 3, 5, …) fill from the right (end of the array).
  • The left pointer advances forward; the right pointer advances backward. They meet in the middle.
  • Pinned items arrive with score 1.0 (see Stage 6: Place). They will be ranked highly and placed at the edges, which is the intended behavior — pinned items should receive maximum attention.

Diagnostics

The diagnostics system exposes pipeline selection decisions as structured, inspectable data without affecting output or performance when disabled.

Overview

Diagnostics answer the question: “why was this item included or excluded?” The pipeline instruments each stage with trace calls that record durations, item counts, and per-item decisions. When diagnostics are disabled, these calls are zero-cost. When enabled, the accumulated events form a SelectionReport that callers can inspect after the pipeline run completes.

Ownership Model

The trace collector is passed at call time, not stored on the pipeline. One collector instance is used per pipeline execution.

This makes diagnostics per-invocation: each pipeline run receives a clean diagnostic context, and different concurrent runs can use different diagnostic configurations.

Rationale: Per-invocation ownership avoids thread-safety concerns and ensures each pipeline run gets an isolated diagnostic context.

Rejected alternative: Storing the collector on the pipeline object — this couples the collector lifecycle to the pipeline instance, prevents concurrent runs with different diagnostic configurations, and turns a per-call concern into a per-instance configuration.

Null-Path Guarantee

When diagnostics are disabled (via NullTraceCollector), no event objects are constructed and no performance overhead is incurred. Implementations achieve this by checking is_enabled before constructing event payloads.

Rationale: Diagnostic instrumentation must never regress pipeline performance for callers who do not need it. The is_enabled check is the single gate that eliminates all allocation and computation on the disabled path.

Data Flow

flowchart TD
    Input["Candidate Items\n(list of ContextItem)"] --> Pipeline

    Pipeline["Pipeline\n(trace calls at each stage)"] --> Output["Selected Items\n(list of ContextItem)"]
    Pipeline --> Collector["TraceCollector\n(buffers events)"]

    Collector --> Report["SelectionReport\n(complete diagnostic output)"]

Summary

TypeRoleSpec page
TraceCollectorObserver contract for pipeline instrumentationTraceCollector
TraceEventStructured record of a single pipeline observationEvents
PipelineStageEnumeration of observable pipeline stagesEvents
OverflowEventRecord produced when budget overflow occurs under Proceed strategyEvents
ExclusionReasonWhy an item was excluded from the context windowExclusion Reasons
InclusionReasonWhy an item was included in the context windowExclusion Reasons
SelectionReportComplete diagnostic output from a pipeline runSelectionReport

TraceCollector

Overview

The TraceCollector is an observer interface that pipeline implementations call at instrumentation points throughout each stage. It has two built-in implementations: NullTraceCollector (zero-cost disabled path) and DiagnosticTraceCollector (buffered recording with configurable detail level).

The pipeline calls is_enabled before constructing event payloads. When is_enabled returns false, no events are created and no work is done.

Contract

MemberSignature (pseudocode)Description
is_enabled-> booleanWhether this collector is actively recording. Callers check before constructing event payloads to avoid unnecessary allocations.
record_stage_event(event: TraceEvent) -> voidRecord a stage-level event. Called once per pipeline stage, after the stage completes.
record_item_event(event: TraceEvent) -> voidRecord an item-level event. May be filtered by the collector’s configured detail level.

NullTraceCollector

NullTraceCollector is the disabled implementation. is_enabled returns false. Both record_stage_event and record_item_event are no-ops.

A singleton instance is recommended but not required — callers that need a disabled collector can use any NullTraceCollector instance.

See Null-Path Guarantee for the rationale and implementation contract.

DiagnosticTraceCollector

DiagnosticTraceCollector is the enabled implementation. is_enabled returns true. Events are buffered in insertion order.

Construction Parameters

ParameterTypeRequiredDefaultDescription
detail_levelTraceDetailLevelNoStageControls which events are recorded.

Behavior

  • record_stage_event: always records the event, regardless of detail_level.
  • record_item_event: records the event only when detail_level is Item or higher. When detail_level is Stage, item events are silently discarded.

After the pipeline run completes, the caller extracts a SelectionReport from the collector’s buffered events. See SelectionReport.

TraceDetailLevel

TraceDetailLevel controls the granularity of events recorded by DiagnosticTraceCollector. Values are ordered: a higher ordinal includes all events from lower ordinals.

ValueOrdinalDescription
Stage0Stage-level events only (durations, item counts per stage)
Item1Stage-level events plus per-item events (individual scores, exclusion reasons)

Rationale: Two levels provide a meaningful performance/detail trade-off. Stage-level is sufficient for performance profiling and pipeline health checks. Item-level is needed for “why was this item excluded?” debugging. A coarser distinction avoids the ambiguity of intermediate values.

Rejected alternative: A continuous verbosity integer — discrete levels are more meaningful to callers and prevent ambiguous intermediate values where behavior would be undefined.

Conformance Notes

  • Implementations MUST provide both a null (disabled) and a diagnostic (enabled) collector.
  • The null collector MUST return false from is_enabled and MUST NOT allocate or buffer when record_stage_event or record_item_event are called.
  • record_item_event MUST be gated by the configured detail_level. When detail_level is Stage, item events MUST be discarded without recording.
  • Implementations MUST preserve event insertion order in the accumulated event list.

Observer Callback (Optional — MAY)

Implementations may support an optional observer callback that fires synchronously on each recorded event. The spec defines this as a capability, not a mechanism — implementations choose how to expose the callback (function pointer, closure, delegate, trait object, etc.).

Rationale: Synchronous firing ensures events are observable in real time, enabling streaming diagnostic use cases. The callback mechanism is left to implementations because callback abstractions differ fundamentally across languages and concurrency models. The spec prescribes the observable behavior (synchronous, per-event) without constraining the API surface.

Events

Overview

Two structural event types carry diagnostic information out of the pipeline: TraceEvent (produced at every instrumentation point) and OverflowEvent (produced only when budget overflow occurs under the Proceed strategy). Each pipeline stage fires exactly one stage-level TraceEvent after it completes.

TraceEvent

TraceEvent is the universal record produced at each pipeline instrumentation point. Stage-level events capture timing and throughput for a complete stage; item-level events capture decisions for individual items.

Fields

FieldTypeRequiredDefaultDescription
stagePipelineStageYesThe pipeline stage that produced this event.
duration_msfloat64YesWall-clock duration of the stage in milliseconds. For item-level events, this is 0.0.
item_countintegerYesNumber of items processed by this stage. For item-level events, this is 1. (For item-level events, the value is 1, which is a sentinel indicating a single item was processed — it carries no aggregate meaning.)
messagestringNo(absent)Optional diagnostic message. Absent when not provided.

Rationale for duration_ms as float64: Milliseconds provide human-readable precision for diagnostic output while remaining portable across all languages. Float64 accommodates sub-millisecond precision without integer overflow concerns and maps cleanly to IEEE 754 double in all target languages.

Rejected alternative — integer nanoseconds: Less readable in diagnostic output; requires unit conversion in most display contexts.

Rejected alternative — ISO 8601 duration strings: Harder to aggregate and compare programmatically; parsing overhead at query time.

Stage-Level Event Example

{
  "stage": "Score",
  "duration_ms": 12.4,
  "item_count": 47
}

The message field is absent because it was not provided. Absent fields are omitted from the JSON representation.

Item-Level Event Example

{
  "stage": "Slice",
  "duration_ms": 0.0,
  "item_count": 1,
  "message": "BudgetExceeded: 2048 tokens requested, 512 available"
}

Item-level events set duration_ms to 0.0 because item processing time is not independently measurable from the stage as a whole.

PipelineStage

PipelineStage enumerates the observable stages of the pipeline. Each value corresponds to a stage that emits diagnostic events.

ValueDescription
"Classify"Classification and partitioning stage
"Score"Scoring stage
"Deduplicate"Deduplication stage
"Slice"Budget-fitting selection stage
"Place"Final ordering and merge stage

Sort is omitted. Sort is an internal ordering step with no user-visible diagnostic boundary and no meaningful duration to report independently. Including it would add a stage that built-in pipelines emit no meaningful diagnostics for, increasing conformance burden without diagnostic value.

Implementations may emit events for internal stages not listed here. Callers should handle unknown PipelineStage values gracefully when processing diagnostic data (see Conformance Notes below).

OverflowEvent

OverflowEvent is produced when the merged result (pinned + sliced items) exceeds target_tokens and OverflowStrategy is Proceed. See OverflowStrategy.

OverflowEvent is a data type only. The spec defines its structure but not its delivery mechanism — implementations choose how to surface the event (callback, return value, collector method, etc.).

Rationale: Delivery mechanisms differ across languages and concurrency models. The spec prescribes what information is captured, not how it reaches the caller.

Fields

FieldTypeRequiredDefaultDescription
tokens_over_budgetintegerYesNumber of tokens exceeding the target budget.
overflowing_itemslist of ContextItemYesAll items present at the time of overflow (pinned + sliced combined).
budgetContextBudgetYesThe budget that was exceeded.

Example

{
  "tokens_over_budget": 384,
  "overflowing_items": [
    { "content": "...", "tokens": 512, "kind": "Message" },
    { "content": "...", "tokens": 1024, "kind": "ToolOutput" }
  ],
  "budget": {
    "maxTokens": 4096,
    "targetTokens": 3072
  }
}

Conformance Notes

  • Stage-level TraceEvent records MUST have duration_ms set to the wall-clock duration of that stage. Item-level events MUST have duration_ms set to 0.0.
  • Each named pipeline stage (Classify, Score, Deduplicate, Slice, Place) MUST produce exactly one stage-level TraceEvent per pipeline run, emitted after the stage completes. This applies even when a stage processes zero items — the event is emitted with item_count of 0.
  • item-level events for a stage MUST precede the corresponding stage-level event. Item-level events are recorded as each item is processed; the stage-level event is emitted only after all items in the stage have been processed. Conformance vectors that test the events list must be authored to reflect this ordering.
  • OverflowEvent MUST only be produced when OverflowStrategy is Proceed. It MUST NOT be produced for Throw, Truncate, or any other strategy.
  • Implementations MUST handle unknown PipelineStage values gracefully when deserializing diagnostic data from other implementations.

Exclusion Reasons

Overview

Every item considered by the pipeline receives either an inclusion reason or an exclusion reason. These reasons form the explainability core of the diagnostics system — they answer “why was this item included or excluded?” Reasons are attached to items in the SelectionReport, where each IncludedItem carries an InclusionReason and each ExcludedItem carries an ExclusionReason.

ExclusionReason

ExclusionReason describes why the pipeline did not select an item for the context window. Each variant is data-carrying: it includes associated fields that provide context for the exclusion decision. This is a deliberate design choice — data-carrying variants enable callers to programmatically inspect exclusion details without parsing message strings.

Rejected alternative: fieldless enum variants — simpler wire format, but forces callers to reconstruct context from other diagnostic data, which is fragile and error-prone.

VariantDescriptionFieldsEmitted byStatus
BudgetExceededItem did not fit within the token budgetitem_tokens: integer, available_tokens: integerSlice stage, Place stage (Truncate)Active
ScoredTooLowItem scored below the selection thresholdscore: float64, threshold: float64Reserved
DeduplicatedByte-exact content duplicate removeddeduplicated_against: string (content identifier)Deduplicate stageActive
QuotaCapExceededKind exceeded its quota capkind: string, cap: integer, actual: integerReserved
QuotaRequireDisplacedDisplaced to satisfy another kind’s quota requirementdisplaced_by_kind: stringReserved
NegativeTokensItem has a negative token counttokens: integerClassify stageActive
PinnedOverrideDisplaced by a pinned item during truncation overflow handlingdisplaced_by: string (content identifier)Place stage (Truncate)Active
FilteredExcluded by a user-defined filter predicatefilter_name: stringReserved

Reserved variants are defined but not emitted by any built-in pipeline stage. They are reserved for future specification versions and custom stage implementations. Implementations must include these variants in their type definitions to ensure forward compatibility.

Rationale: defining reserved variants now ensures that implementations allocate space in their type system for future extensions, avoiding breaking changes when these variants become active.

JSON example — BudgetExceeded:

{
  "reason": "BudgetExceeded",
  "item_tokens": 2048,
  "available_tokens": 512
}

JSON example — Deduplicated:

{
  "reason": "Deduplicated",
  "deduplicated_against": "tool_output_abc123"
}

JSON example — NegativeTokens:

{
  "reason": "NegativeTokens",
  "tokens": -1
}

Reserved variant examples:

JSON example — ScoredTooLow:

{
  "reason": "ScoredTooLow",
  "score": 0.12,
  "threshold": 0.25
}

JSON example — QuotaCapExceeded:

{
  "reason": "QuotaCapExceeded",
  "kind": "ToolOutput",
  "cap": 3,
  "actual": 4
}

JSON example — QuotaRequireDisplaced:

{
  "reason": "QuotaRequireDisplaced",
  "displaced_by_kind": "SystemPrompt"
}

JSON example — Filtered:

{
  "reason": "Filtered",
  "filter_name": "max_age_filter"
}

Wire format: the reason field is a string discriminator. Variant-specific fields appear as sibling fields alongside reason. Fields that belong to other variants are omitted — absent fields are never represented as nulls. Both ExclusionReason and InclusionReason use this same envelope format: { "reason": "<VariantName>", ...fields }. For InclusionReason, which carries no fields, the envelope contains only the reason discriminator.

InclusionReason

InclusionReason describes why the pipeline selected an item for the context window. Inclusion reasons are not data-carrying — they carry no additional fields beyond the variant name. The score itself (carried on IncludedItem) provides the quantitative detail.

Rationale: inclusion reasons are simple status indicators. Data-carrying variants would add complexity without diagnostic value.

VariantDescriptionEmitted by
ScoredIncluded based on computed score within budgetPlace stage
PinnedBypassed scoring and slicing due to pinned statusPlace stage
ZeroTokenIncluded at no budget cost (zero-token item)Place stage

JSON example:

{
  "reason": "Scored"
}

Conformance Notes

  • Implementations MUST define all 8 ExclusionReason variants, including reserved variants, to ensure forward compatibility.
  • Implementations MUST handle unknown ExclusionReason variants gracefully when deserializing diagnostic data from other implementations or future specification versions.
  • The reason field in the JSON wire format MUST be the string name of the variant (e.g., "BudgetExceeded", not an integer code).
  • Data-carrying fields MUST be present when the variant is emitted. For example, BudgetExceeded MUST include both item_tokens and available_tokens.
  • Reserved variants MUST NOT be emitted by built-in pipeline stages. Custom stage implementations may emit reserved variants.

SelectionReport

Overview

The SelectionReport is the complete diagnostic output from a pipeline run. It contains the event log, lists of included and excluded items with their reasons, and aggregate statistics. It is the primary artifact callers use to understand pipeline decisions — answering questions such as “why was this item excluded?” and “which items competed for the same budget?”

SelectionReport Fields

FieldTypeRequiredDefaultDescription
eventslist of TraceEventYesAll recorded events in insertion (stage) order. See Events.
includedlist of IncludedItemYesItems selected for the context window, in final placed order.
excludedlist of ExcludedItemYesItems not selected, sorted by score descending (stable by insertion order on ties).
total_candidatesintegerYesTotal number of items considered by the pipeline. Equals len(included) + len(excluded).
total_tokens_consideredintegerYesSum of tokens across all items in both included and excluded lists.

Items excluded at any pipeline stage — including pre-scoring exclusions such as NegativeTokens at the Classify stage and Deduplicated at the Deduplicate stage — appear in the excluded list. The excluded list is the complete set of items not selected, regardless of which stage excluded them. Items excluded before scoring carry a score of 0.0.

How to Obtain

OBTAIN-REPORT:
    collector <- CREATE DiagnosticTraceCollector(detail_level)
    result    <- pipeline.RUN(candidates, budget, collector)
    report    <- collector.BUILD-REPORT()
    // result contains the selected items for use in the LLM call
    // report contains the diagnostic data for inspection

The report is extracted from the collector after the pipeline run completes. This keeps the pipeline’s primary return type unchanged — the report is a side-channel, not part of the main output.

Rationale: post-call extraction separates diagnostic concerns from the pipeline’s functional return value. Callers who do not need diagnostics never interact with report types.

Rejected alternative: returning the report as part of the pipeline result — couples diagnostics to every call site, even those that do not need it.

Complete JSON example:

{
  "events": [
    { "stage": "Classify", "duration_ms": 0.3, "item_count": 5 },
    { "stage": "Score", "duration_ms": 8.2, "item_count": 3 },
    { "stage": "Deduplicate", "duration_ms": 0.1, "item_count": 3 },
    { "stage": "Slice", "duration_ms": 1.4, "item_count": 3 },
    { "stage": "Place", "duration_ms": 0.2, "item_count": 2 }
  ],
  "included": [
    {
      "item": { "content": "Recent conversation turn", "tokens": 256, "kind": "Message" },
      "score": 0.92,
      "reason": { "reason": "Scored" }
    },
    {
      "item": { "content": "System prompt", "tokens": 128, "kind": "SystemPrompt" },
      "score": 0.0,
      "reason": { "reason": "Pinned" }
    }
  ],
  "excluded": [
    {
      "item": { "content": "Old tool output", "tokens": 2048, "kind": "ToolOutput" },
      "score": 0.45,
      "reason": { "reason": "BudgetExceeded", "item_tokens": 2048, "available_tokens": 512 }
    },
    {
      "item": { "content": "Duplicate message", "tokens": 256, "kind": "Message" },
      "score": 0.30,
      "reason": { "reason": "Deduplicated", "deduplicated_against": "Recent conversation turn" }
    }
  ],
  "total_candidates": 5,
  "total_tokens_considered": 4736
}

IncludedItem

An IncludedItem pairs a context item with the score and reason that led to its inclusion.

FieldTypeRequiredDefaultDescription
itemContextItemYesThe included context item. See ContextItem.
scorefloat64YesThe computed relevance score at time of inclusion. 0.0 for pinned and zero-token items.
reasonInclusionReasonYesWhy this item was included. See Exclusion Reasons.

JSON example:

{
  "item": { "content": "User's latest message", "tokens": 64, "kind": "Message" },
  "score": 0.95,
  "reason": { "reason": "Scored" }
}

ExcludedItem

An ExcludedItem pairs a context item with the score and reason that led to its exclusion.

FieldTypeRequiredDefaultDescription
itemContextItemYesThe excluded context item. See ContextItem.
scorefloat64YesThe computed relevance score at time of exclusion.
reasonExclusionReasonYesWhy this item was excluded. See Exclusion Reasons.

JSON example — BudgetExceeded:

{
  "item": { "content": "Large tool output", "tokens": 4096, "kind": "ToolOutput" },
  "score": 0.60,
  "reason": { "reason": "BudgetExceeded", "item_tokens": 4096, "available_tokens": 1024 }
}

JSON example — Deduplicated:

{
  "item": { "content": "Duplicate content", "tokens": 256, "kind": "Message" },
  "score": 0.40,
  "reason": { "reason": "Deduplicated", "deduplicated_against": "Original message" }
}

deduplicated_against appears only when reason is Deduplicated. For all other reasons, the variant’s own fields appear instead. Absent fields are omitted — no nulls.

Rationale: deduplicated_against is modelled as a field on the Deduplicated reason variant (not as a nullable top-level field on ExcludedItem) because the data-carrying variant design mandates that each variant carries its own context fields. Deduplicated carries deduplicated_against; BudgetExceeded carries item_tokens and available_tokens; and so on. This keeps ExcludedItem clean (always 3 fields: item, score, reason) and avoids nullable fields that apply to only one reason.

See Exclusion Reasons for the full rationale on data-carrying variants.

Conformance Notes

  • The excluded list MUST be sorted by score descending. When two items have equal scores, the item excluded earlier in the pipeline run appears first (stable sort by insertion order on ties). This ordering surfaces the highest-value rejected items first, which is the most useful presentation for debugging “why wasn’t this included?” questions.

Rejected alternative: insertion order — less useful for diagnosis; the highest-scored excluded item is rarely the first item processed.

  • The included list MUST be in final placed order (the order determined by the Placer), not score order or insertion order.
  • total_candidates MUST equal len(included) + len(excluded).
  • total_tokens_considered MUST equal the sum of tokens across all items in both included and excluded lists.
  • The score field on IncludedItem and ExcludedItem MUST reflect the score at the time the inclusion/exclusion decision was made, not a recalculated value. Items excluded before the Score stage (e.g., NegativeTokens at Classify) MUST have a score of 0.0.

OpenTelemetry Integration

The Wollax.Cupel.Diagnostics.OpenTelemetry companion package bridges the core pipeline’s ITraceCollector/TraceCollector abstraction to .NET ActivitySource, emitting structured telemetry that traces context-window construction.

Overview

Wollax.Cupel.Diagnostics.OpenTelemetry is a separate NuGet assembly. The core Wollax.Cupel assembly has zero dependency on OpenTelemetry (R032/D039). Callers who do not need telemetry incur no transitive dependency on the OpenTelemetry SDK.

The companion registers a single ActivitySource with the canonical name "Wollax.Cupel". To receive traces, callers add this name in their OpenTelemetry configuration:

services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddSource("Wollax.Cupel")
        // ... exporters ...
    );

Pre-Stability Disclaimer

Warning: All cupel.* attribute names documented in this chapter are pre-stable. The OpenTelemetry LLM semantic conventions are under active development, and cupel.* names may be renamed, merged, or removed as the conventions stabilize.

Do not build alert rules, dashboards, or automation that hard-code specific cupel.* attribute names. Use the attribute names only for ad-hoc debugging and observability during development and staging.

Activity Hierarchy

A single pipeline run produces a tree of Activities:

cupel.pipeline                            ← root; spans the full pipeline run
├── cupel.stage.classify
├── cupel.stage.score
├── cupel.stage.deduplicate
├── cupel.stage.slice
└── cupel.stage.place

There are exactly five stage Activities, one per diagnostic stage. Stage names in cupel.stage.{name} attribute values use lowercase only, matching the PipelineStage enum value lowercasing.

Sort is omitted. Sort is an internal ordering step with no diagnostic boundary and no meaningful duration to report independently. It does not receive its own Activity, consistent with the PipelineStage enum omission documented in Events.

Verbosity Tiers

The CupelVerbosity enum controls how much diagnostic data is emitted per pipeline run. Choose the tier appropriate for the target environment.

StageOnly — Production-safe

Emits the root Activity and five stage Activities with item-count boundaries only. No per-item events are recorded.

Root Activity (cupel.pipeline) attributes:

AttributeTypeDescription
cupel.budget.max_tokensintThe pipeline’s maximum token budget
cupel.verbositystringThe verbosity tier name ("StageOnly")

Each stage Activity (cupel.stage.{name}) attributes:

AttributeTypeDescription
cupel.stage.namestringStage name in lowercase (e.g., "classify", "score")
cupel.stage.item_count_inintNumber of items entering this stage
cupel.stage.item_count_outintNumber of items leaving this stage

Note: No duration_ms attribute is present. Stage duration is provided by the Activity’s own start and end timestamps; recording it as an attribute would duplicate information already available in the trace span.

StageAndExclusions — Staging environments

Emits all StageOnly attributes plus per-exclusion Events on each stage Activity where an exclusion occurred.

Additions over StageOnly:

On each stage Activity, for every item excluded during that stage, one cupel.exclusion Event is recorded:

Event AttributeTypeDescription
cupel.exclusion.reasonstringExclusionReason variant name (e.g., "BudgetExceeded")
cupel.exclusion.item_kindstringThe kind of the excluded item
cupel.exclusion.item_tokensintToken count of the excluded item

Additionally, a summary attribute is added to the stage Activity:

AttributeTypeDescription
cupel.exclusion.countintTotal number of exclusions in this stage

Note: cupel.exclusion.reason values are open-ended. New ExclusionReason variants may appear in future spec versions without a schema change. Trace backends and dashboards MUST NOT hard-code the set of expected reason values.

Full — Development only

Emits all StageAndExclusions attributes plus per-included-item Events recorded after the Place stage completes.

Additions over StageAndExclusions:

After the Place stage, for every item included in the final output, one cupel.item.included Event is recorded:

Event AttributeTypeDescription
cupel.item.kindstringThe kind of the included item
cupel.item.tokensintToken count of the included item
cupel.item.scorefloat64The item’s final composite score

Note: No placement-position attribute is recorded. Position is a derived property of ordering and not a stable attribute of the item itself.

Cardinality

VerbosityEvents / Pipeline RunRecommended Environment
StageOnly~10 (5 stage Activities + root)Production
StageAndExclusions~10 + 0–300 (depends on exclusion count)Staging
Full~10 + 0–1000 (depends on item count)Development only

Warning: Full verbosity can produce high-cardinality traces with large item sets. Do not enable Full in production without a sampling strategy that caps trace volume.

Attribute Reference

Complete flat table of all cupel.* attributes and events for quick lookup.

NameTypeTier(s)CarrierDescription
cupel.budget.max_tokensintAllRoot ActivityPipeline maximum token budget
cupel.verbositystringAllRoot ActivityVerbosity tier name
cupel.stage.namestringAllStage ActivityStage name, lowercase
cupel.stage.item_count_inintAllStage ActivityItems entering the stage
cupel.stage.item_count_outintAllStage ActivityItems leaving the stage
cupel.exclusion.countintStageAndExclusions, FullStage ActivityTotal exclusions in the stage
cupel.exclusion.reasonstringStageAndExclusions, Fullcupel.exclusion EventExclusionReason variant name
cupel.exclusion.item_kindstringStageAndExclusions, Fullcupel.exclusion EventKind of the excluded item
cupel.exclusion.item_tokensintStageAndExclusions, Fullcupel.exclusion EventToken count of the excluded item
cupel.item.kindstringFullcupel.item.included EventKind of the included item
cupel.item.tokensintFullcupel.item.included EventToken count of the included item
cupel.item.scorefloat64Fullcupel.item.included EventFinal composite score of the included item

Conformance Notes

  • cupel.exclusion.reason mapping: Values MUST be the canonical ExclusionReason variant name string (e.g., "BudgetExceeded", "Deduplicated"). Implementations MUST NOT use numeric codes, display strings, or locale-specific representations. New ExclusionReason variants in future spec versions will appear as new attribute values — trace backends must not hard-code the set.
  • ActivitySource name: Implementations MUST register the ActivitySource with the name "Wollax.Cupel" exactly. Callers configure their OpenTelemetry pipeline using this string; any deviation breaks trace collection silently.
  • Zero-dependency core: The Wollax.Cupel assembly MUST have no compile-time or runtime dependency on any OpenTelemetry SDK assembly. The companion package provides the bridge; the core pipeline operates through ITraceCollector only.
  • Stage Activity names: Stage Activity names MUST follow the pattern cupel.stage.{name} where {name} is the lowercase stage name. The five canonical values are classify, score, deduplicate, slice, place.

Rust (cupel-otel)

The cupel-otel crate is the Rust equivalent of Wollax.Cupel.Diagnostics.OpenTelemetry. It implements the TraceCollector trait from the core cupel crate and bridges pipeline execution data to the opentelemetry API.

Source name: The Rust implementation registers with the source name "cupel" (distinct from the .NET source name "Wollax.Cupel" — D163). Callers must configure "cupel" in their OpenTelemetry SDK tracer provider:

#![allow(unused)]
fn main() {
// Register the tracer provider with the cupel source name
let tracer_provider = SdkTracerProvider::builder()
    .with_batch_exporter(exporter)
    .build();
opentelemetry::global::set_tracer_provider(tracer_provider);
}

Adding to a project

Add cupel-otel as a dependency in Cargo.toml:

[dependencies]
cupel-otel = "0.1"

Usage example

#![allow(unused)]
fn main() {
use cupel_otel::{CupelOtelTraceCollector, CupelVerbosity};

let collector = CupelOtelTraceCollector::new(CupelVerbosity::StageOnly);
let result = pipeline.run_traced(&items, &collector).await;
collector.on_pipeline_completed(&result.stage_traces);
// Flush the tracer provider before process exit
}

Implementation notes

  • Explicit .end() is mandatory. The opentelemetry 0.27 Span type does not auto-end on drop (D169). The CupelOtelTraceCollector calls .end() explicitly on each span after recording its attributes and events. Callers who implement a custom TraceCollector using the opentelemetry API must also call .end() explicitly.
  • SpanData import path. The correct import is opentelemetry_sdk::export::trace::SpanData, not opentelemetry_sdk::trace::SpanData. The latter does not exist in opentelemetry_sdk 0.27.
  • Unknown ExclusionReason variants. The cupel.exclusion.reason attribute uses a _ => "Unknown" match arm for any unrecognised ExclusionReason variant. This provides forward compatibility with future spec variants — unknown reasons are emitted as "Unknown" rather than causing a compilation failure or panic.

Budget Simulation

Budget simulation methods are extension methods on CupelPipeline in the .NET implementation. They orchestrate internal DryRun calls to answer questions about what the pipeline would select at different token budgets — for example, which items are marginal at a given budget, or what is the minimum budget required to include a specific item.

Language Parity Note: The budget simulation API (GetMarginalItems, FindMinBudgetFor) is scoped to the .NET implementation in v1.3. Rust parity is deferred — the Rust crate does not expose these methods. The deferral rationale: the Rust Pipeline does not yet expose a public DryRun equivalent, and budget simulation was not a priority for Rust consumers in this release cycle. A future milestone will assess Rust parity when the Rust pipeline gains a dry_run surface.

SweepBudget Out-of-Scope Note: SweepBudget (exhaustive budget sweep) has been assigned to the Smelt project and will not be added to Cupel.


DryRun Determinism Invariant

DryRun MUST produce identical output for identical inputs. The slicer’s tie-breaking rule MUST be deterministic and stable across calls. Implementations that depend on non-deterministic ordering (e.g., hash-map iteration order) are non-conformant.

For GreedySlice, determinism is guaranteed by the deterministic tie-break contract: equal-density items are ordered by original index ascending. This means repeated DryRun calls with the same items and budget always produce the same included set in the same order — the foundation on which GetMarginalItems and FindMinBudgetFor rely to produce meaningful diff and binary-search results.


GetMarginalItems

Purpose

Identify which items are included in a full-budget run but excluded when the budget is reduced by slackTokens. These are the items that become available as the budget grows from (budget.MaxTokens - slackTokens) up to budget.MaxTokens.

Signature

IReadOnlyList<ContextItem> GetMarginalItems(
    IReadOnlyList<ContextItem> items,
    ContextBudget budget,
    int slackTokens)

Budget Parameter

The budget parameter overrides the pipeline’s stored budget for both internal DryRun calls. This is required because DryRun uses the pipeline’s fixed budget by construction; the extension method supplies a temporary budget for its internal calls.

The reduced-budget run uses budget.MaxTokens - slackTokens as the max token count, with the same outputReserve and reservedSlots as the full budget. Formally:

reducedBudget = ContextBudget(
    maxTokens:    budget.MaxTokens - slackTokens,
    targetTokens: budget.TargetTokens - slackTokens,
    outputReserve: budget.OutputReserve)

Diff Direction

primary \ margin — items present in the full-budget result (primary) that are absent from the reduced-budget result (margin). Item identity is determined by object reference equality.

Monotonicity Assumption

GetMarginalItems assumes that a lower budget never causes a new item to appear that was not present at the higher budget (monotonic inclusion). This assumption holds for GreedySlice and KnapsackSlice but does not hold for QuotaSlice, where percentage-based allocations shift as the budget changes and can cause different kinds to appear at lower budgets.

QuotaSlice Guard

If the pipeline’s slicer is QuotaSlice, the method MUST throw InvalidOperationException with the message:

"GetMarginalItems requires monotonic item inclusion. QuotaSlice produces non-monotonic inclusion as budget changes shift percentage allocations."

Pseudocode

GET-MARGINAL-ITEMS(pipeline, items, budget, slackTokens):
    reducedBudget <- ContextBudget(maxTokens: budget.maxTokens - slackTokens,
                                   targetTokens: budget.targetTokens - slackTokens,
                                   outputReserve: budget.outputReserve)
    primary <- pipeline.DRY-RUN(items, budget)
    margin  <- pipeline.DRY-RUN(items, reducedBudget)
    return [item in primary.included where item not in margin.included]

FindMinBudgetFor

Purpose

Find the minimum token budget (within a search ceiling) at which targetItem would be included in the selection result. Returns null if targetItem is not selectable within [targetItem.Tokens, searchCeiling].

Signature

int? FindMinBudgetFor(
    IReadOnlyList<ContextItem> items,
    ContextItem targetItem,
    int searchCeiling)

Preconditions

Both conditions are checked before the search begins. If either is violated, the method MUST throw ArgumentException:

  • targetItem must be an element of items.
  • searchCeiling >= targetItem.Tokens.

The search range is [targetItem.Tokens, searchCeiling]. The lower bound is targetItem.Tokens as an optimization: the target item cannot be included in a budget smaller than its own token count.

The stop condition is high - low <= 1. At termination, high is the candidate minimum budget. The search performs approximately log₂(searchCeiling) DryRun invocations — typically 10–15 for realistic budget ceilings.

After the loop exits, the method performs one final DryRun at high to confirm inclusion. If targetItem is present in report.included, high is returned. Otherwise null is returned.

Return Value

int?null means targetItem is not selectable within [targetItem.Tokens, searchCeiling] at any tested budget.

QuotaSlice + CountQuotaSlice Guard

If the pipeline’s slicer is QuotaSlice or CountQuotaSlice, the method MUST throw InvalidOperationException with the message:

"FindMinBudgetFor requires monotonic item inclusion. QuotaSlice and CountQuotaSlice produce non-monotonic inclusion as budget changes shift allocations. Use a GreedySlice or KnapsackSlice inner slicer for budget simulation."

The general precondition is: any slicer whose item inclusion is sensitive to absolute budget value in a non-monotonic way is incompatible with FindMinBudgetFor.

Pseudocode

FIND-MIN-BUDGET-FOR(pipeline, items, targetItem, searchCeiling):
    low  <- targetItem.tokens      // inclusive lower bound
    high <- searchCeiling          // inclusive upper bound

    if targetItem not in items:
        throw ArgumentException("targetItem must be an element of items")
    if searchCeiling < targetItem.tokens:
        throw ArgumentException("searchCeiling must be >= targetItem.Tokens")

    while high - low > 1:
        mid <- low + (high - low) / 2
        midBudget <- ContextBudget(maxTokens: mid, ...)
        report <- pipeline.DRY-RUN(items, midBudget)
        if targetItem in report.included:
            high <- mid
        else:
            low <- mid

    // Verify at high (the candidate minimum)
    finalBudget <- ContextBudget(maxTokens: high, ...)
    finalReport <- pipeline.DRY-RUN(items, finalBudget)
    if targetItem in finalReport.included:
        return high
    return null

Conformance Notes

  • DryRun is the primitive; budget simulation builds entirely on top of it. Implementations MUST satisfy the DryRun Determinism Invariant before exposing budget simulation methods.
  • Custom slicers that implement non-monotonic inclusion MUST NOT be used with FindMinBudgetFor or GetMarginalItems without explicit documentation of the monotonicity property.
  • The included field of the DryRun result (see SelectionReport) is the source of truth for item identity comparisons in both methods.

Policy Sensitivity

Policy sensitivity runs multiple policy configurations over the same item set and computes a structured diff showing which items changed inclusion status across those configurations. It is the primary tool for comparing how different scorer, slicer, or placer choices affect selection — without constructing N full pipeline objects.


API

.NET

CupelPipeline.DryRunWithPolicy — run a single policy configuration without constructing a new pipeline:

ContextResult DryRunWithPolicy(
    IReadOnlyList<ContextItem> items,
    ContextBudget budget,
    CupelPolicy policy)

Defined on CupelPipeline. Internally constructs a temporary pipeline using CreateBuilder().WithPolicy(policy).WithBudget(budget) and calls DryRunWithBudget. The returned ContextResult.Report is a SelectionReport covering all included and excluded items.

PolicySensitivityExtensions.PolicySensitivity (policy overload) — run multiple policy configurations and diff them:

// Policy-accepting overload
PolicySensitivityReport PolicySensitivity(
    IReadOnlyList<ContextItem> items,
    ContextBudget budget,
    params (string Label, CupelPolicy Policy)[] variants)

Extension method on CupelPipeline in Wollax.Cupel.Diagnostics. The pipeline-based overload (accepting (string Label, CupelPipeline Pipeline)[]) remains available for callers needing full pipeline control.

Throws ArgumentException if fewer than 2 variants are supplied.

Rust

cupel::analytics::policy_sensitivity — run multiple Policy configurations:

#![allow(unused)]
fn main() {
pub fn policy_sensitivity(
    items: &[ContextItem],
    budget: &ContextBudget,
    variants: &[(impl AsRef<str>, &Policy)],
) -> Result<PolicySensitivityReport, CupelError>
}

In cupel::analytics. Returns Err(CupelError::PipelineConfig(...)) if fewer than 2 variants are provided.

cupel::analytics::policy_sensitivity_from_pipelines — pipeline-based variant:

#![allow(unused)]
fn main() {
pub fn policy_sensitivity_from_pipelines(
    items: &[ContextItem],
    budget: &ContextBudget,
    variants: &[(impl AsRef<str>, &Pipeline)],
) -> Result<PolicySensitivityReport, CupelError>
}

Use when Policy cannot express the required slicer (e.g., CountQuotaSlice — see Language Notes below).


Types

PolicySensitivityReport

#![allow(unused)]
fn main() {
// Rust
pub struct PolicySensitivityReport {
    /// Per-variant selection results, in input order.
    pub variants: Vec<(String, SelectionReport)>,
    /// Items whose inclusion status differed across at least two variants.
    pub diffs: Vec<PolicySensitivityDiffEntry>,
}
}
// .NET
public sealed class PolicySensitivityReport
{
    public IReadOnlyList<(string Label, SelectionReport Report)> Variants { get; init; }
    public IReadOnlyList<PolicySensitivityDiffEntry> Diffs { get; init; }
}

PolicySensitivityDiffEntry

#![allow(unused)]
fn main() {
// Rust
pub struct PolicySensitivityDiffEntry {
    /// The content string identifying the item.
    pub content: String,
    /// One entry per variant, in input order.
    pub statuses: Vec<(String, ItemStatus)>,
}
}
// .NET
public sealed class PolicySensitivityDiffEntry
{
    public string Content { get; init; }
    public IReadOnlyList<(string Label, ItemStatus Status)> Statuses { get; init; }
}

ItemStatus

#![allow(unused)]
fn main() {
// Rust
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ItemStatus {
    Included,
    Excluded,
}
}
// .NET
public enum ItemStatus { Included, Excluded }

Diff Semantics

Content-keyed matching. Items are identified by their content string. Two items from different variant runs are considered the same item if their content fields are equal (ordinal comparison). ContextItem object identity is not used.

Swing-only filter. Only items where at least two variants disagree on inclusion appear in diffs. An item included by all variants, or excluded by all variants, does not appear in diffs.

variants preserves input order. PolicySensitivityReport.variants reflects the input variants slice/array in the same order it was supplied.

diffs order is unspecified. The order of entries in PolicySensitivityReport.diffs is implementation-defined. Callers must not rely on a particular ordering of diff entries.


Minimum Variants

Implementations MUST require at least 2 variants. Fewer variants MUST return an error:

  • Rust: Err(CupelError::PipelineConfig("policy_sensitivity requires at least 2 variants"))
  • .NET: ArgumentException with message "At least two variants are required for a sensitivity comparison."

A single-variant call cannot produce a meaningful diff and is always a caller error.


Explicit Budget Parameter

Both DryRunWithPolicy and policy_sensitivity take an explicit budget parameter. There is no option to inherit a budget from the pipeline or policy.

Rationale: A CupelPolicy does not carry a budget. An implicit budget would silently apply the pipeline’s own stored budget to a different policy configuration — producing surprising results when the pipeline budget was set for a different context size. Making the budget explicit forces the caller to confirm the comparison baseline and eliminates the ambiguity.


Language Notes

CupelPolicy cannot express CountQuotaSlice. The SlicerType enum in the .NET CupelPolicy has no CountQuota variant. Callers requiring count-quota fork diagnostics MUST use the pipeline-based overload:

  • .NET: PolicySensitivity(items, budget, params (string Label, CupelPipeline Pipeline)[] variants)
  • Rust: policy_sensitivity_from_pipelines(items, budget, variants: &[(impl AsRef<str>, &Pipeline)])

This is a known CupelPolicy coverage gap, not a limitation of the underlying diff engine.


Examples

.NET — 2-variant comparison

using Wollax.Cupel;
using Wollax.Cupel.Diagnostics;

var items = new List<ContextItem>
{
    new ContextItemBuilder("item-a", 40).WithPriority(10).Build(),
    new ContextItemBuilder("item-b", 40).WithPriority(1).Build(),
};
var budget = new ContextBudget(maxTokens: 50, targetTokens: 40);

var policyA = new CupelPolicy { Intent = "code-review" };
var policyB = new CupelPolicy { Intent = "rag" };

var pipeline = CupelPipeline.CreateBuilder().WithBudget(budget).Build();
var report = pipeline.PolicySensitivity(items, budget,
    ("code-review", policyA),
    ("rag", policyB));

Console.WriteLine($"Swinging items: {report.Diffs.Count}");
foreach (var diff in report.Diffs)
{
    Console.WriteLine($"  {diff.Content}:");
    foreach (var (label, status) in diff.Statuses)
        Console.WriteLine($"    {label}: {status}");
}

Rust — 2-variant comparison

#![allow(unused)]
fn main() {
use cupel::{
    ChronologicalPlacer, ContextBudget, ContextItemBuilder, GreedySlice,
    OverflowStrategy, Policy, PolicyBuilder, PriorityScorer, ReflexiveScorer,
    policy_sensitivity,
};
use std::sync::Arc;
use std::collections::HashMap;

let items = vec![
    ContextItemBuilder::new("item-a", 40)
        .priority(10)
        .future_relevance_hint(0.1)
        .build()
        .unwrap(),
    ContextItemBuilder::new("item-b", 40)
        .priority(1)
        .future_relevance_hint(0.9)
        .build()
        .unwrap(),
];
let budget = ContextBudget::new(50, 40, 0, HashMap::new(), 0.0).unwrap();

let policy_priority = PolicyBuilder::new()
    .scorer(Arc::new(PriorityScorer))
    .slicer(Arc::new(GreedySlice))
    .placer(Arc::new(ChronologicalPlacer))
    .overflow_strategy(OverflowStrategy::Throw)
    .deduplication(true)
    .build()
    .unwrap();

let policy_reflexive = PolicyBuilder::new()
    .scorer(Arc::new(ReflexiveScorer))
    .slicer(Arc::new(GreedySlice))
    .placer(Arc::new(ChronologicalPlacer))
    .overflow_strategy(OverflowStrategy::Throw)
    .deduplication(true)
    .build()
    .unwrap();

let variants: Vec<(&str, &Policy)> = vec![
    ("priority", &policy_priority),
    ("reflexive", &policy_reflexive),
];
let report = policy_sensitivity(&items, &budget, &variants).unwrap();

println!("Swinging items: {}", report.diffs.len());
for diff in &report.diffs {
    println!("  {}:", diff.content);
    for (label, status) in &diff.statuses {
        println!("    {}: {:?}", label, status);
    }
}
}

Conformance Levels

The Cupel conformance suite defines two tiers of test vectors: Required and Optional. Together, they provide a comprehensive validation of an implementation’s correctness.

Required

An implementation MUST pass all Required test vectors to claim Cupel conformance. These vectors cover the core algorithm behavior that every conforming implementation must exhibit.

Covered Algorithms

CategoryAlgorithms
ScorersRecencyScorer, PriorityScorer, KindScorer, TagScorer, FrequencyScorer, ReflexiveScorer, CompositeScorer, ScaledScorer
SlicersGreedySlice, KnapsackSlice, QuotaSlice
PlacersChronologicalPlacer, UShapedPlacer
PipelineEnd-to-end scenarios combining scoring, slicing, and placing

Required vectors test the fundamental contracts of each algorithm:

  • Correct score computation for all 8 scorer types
  • Correct item selection for GreedySlice, KnapsackSlice, and QuotaSlice
  • Correct ordering for both placers
  • End-to-end pipeline behavior with pinned items, multiple scorer/slicer/placer combinations

Optional

An implementation MAY pass Optional test vectors for full conformance. These vectors cover edge cases and advanced features. Passing all Optional vectors demonstrates robustness in boundary conditions.

Covered Scenarios

CategoryScenarios
Scoring edge casesSingle-item recency, all-null timestamps, degenerate ScaledScorer, nested CompositeScorer
Slicing edge casesEmpty input
Pipeline edge casesEmpty input, all-pinned items, content deduplication, overflow with Truncate strategy

Claiming Conformance

An implementation may claim one of two conformance levels:

  • Cupel Conformant: All Required test vectors pass.
  • Cupel Fully Conformant: All Required and all Optional test vectors pass.

Implementations SHOULD document which conformance level they have achieved and the version of the test vector suite used for validation.

Test Vector Format

All conformance test vectors are TOML files. Each file contains a single test case with inputs, configuration, and expected outputs. This chapter documents the schema for each stage type.

Common Structure

Every test vector contains a [test] table that identifies the test:

[test]
name = "Human-readable test name"
stage = "scoring"    # one of: scoring, slicing, placing, pipeline

The stage field determines which additional tables are expected.

Scoring Vectors

Scoring vectors test individual scorer algorithms in isolation.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"scoring"
[test]scorerstringScorer type: "recency", "priority", "kind", "tag", "frequency", "reflexive", "composite", "scaled"
[[items]]contentstringItem content (used as identifier in assertions)
[[items]]tokensintegerToken count
[[items]]timestampdatetime (optional)UTC timestamp
[[items]]priorityinteger (optional)Numeric priority value
[[items]]kindstring (optional)ContextKind value
[[items]]tagsarray of string (optional)Item tags
[[items]]futureRelevanceHintfloat (optional)Caller-provided relevance hint
[[expected]]contentstringItem content to match
[[expected]]score_approxfloatExpected score (compared with epsilon tolerance)
[tolerance]score_epsilonfloatMaximum allowed absolute difference between actual and expected score (default: 1e-9)

Scorer-Specific Configuration

Some scorers require additional configuration:

TableFieldTypeApplies To
[config]use_default_weightsbooleanKindScorer — when true, use default weight map
[[config.weights]]kindstringKindScorer — custom weight kind
[[config.weights]]weightfloatKindScorer — custom weight value
[[config.tag_weights]]tagstringTagScorer — configured tag name
[[config.tag_weights]]weightfloatTagScorer — configured tag weight
[[config.scorers]]typestringCompositeScorer — child scorer type
[[config.scorers]]weightfloatCompositeScorer — child weight
[config]inner_scorerstringScaledScorer — inner scorer type

Score Comparison

Score assertions use epsilon tolerance comparison:

abs(actual_score - expected_score) < score_epsilon

This addresses floating-point representation differences across languages and platforms. The default epsilon of 1e-9 is sufficient for all algorithms in this specification. Test vectors that require different tolerances specify a custom [tolerance] table.

Slicing Vectors

Slicing vectors test slicer algorithms with pre-scored input.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"slicing"
[test]slicerstringSlicer type: "greedy", "knapsack", "quota"
[budget]target_tokensintegerToken budget for selection
[[scored_items]]contentstringItem content
[[scored_items]]tokensintegerToken count
[[scored_items]]scorefloatPre-computed relevance score
[[scored_items]]kindstring (optional)ContextKind (used by QuotaSlice)
[expected]selected_contentsarray of stringContent values of selected items (set comparison — order does not matter)

Slicer-Specific Configuration

TableFieldTypeApplies To
[config]bucket_sizeintegerKnapsackSlice — discretization bucket size (default: 100)
[config]inner_slicerstringQuotaSlice — inner slicer type
[[config.quotas]]kindstringQuotaSlice — ContextKind
[[config.quotas]]requirefloatQuotaSlice — minimum percentage
[[config.quotas]]capfloatQuotaSlice — maximum percentage

Set Comparison

Slicer output is compared as a set — the order of items in selected_contents does not matter. An implementation passes if the set of selected item contents exactly matches the expected set. This is because slicers select items but do not determine presentation order (that is the placer’s responsibility). This applies to all slicers including QuotaSlice — ordering is always the placer’s responsibility, not the slicer’s.

Placing Vectors

Placing vectors test placer algorithms with pre-scored input.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"placing"
[test]placerstringPlacer type: "chronological", "u-shaped"
[[items]]contentstringItem content
[[items]]tokensintegerToken count
[[items]]scorefloatPre-computed relevance score
[[items]]timestampdatetime (optional)UTC timestamp (used by ChronologicalPlacer)
[expected]ordered_contentsarray of stringContent values in expected output order (ordered comparison)

Ordered Comparison

Placer output is compared as an ordered list — the position of each item matters. An implementation passes if the output items, in order, match the expected ordered_contents exactly.

Pipeline Vectors

Pipeline vectors test the full 6-stage pipeline end-to-end.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"pipeline"
[budget]max_tokensintegerMaximum token capacity
[budget]target_tokensintegerTarget token budget for selection
[budget]output_reserveintegerTokens reserved for output (default: 0)
[config]slicerstringSlicer type
[config]placerstringPlacer type
[config]deduplicationbooleanWhether deduplication is enabled
[config]overflow_strategystring (optional)"throw", "truncate", or "proceed" (default: "throw")
[[config.scorers]]typestringScorer type
[[config.scorers]]weightfloatScorer weight (used for CompositeScorer weighting)
[[items]]contentstringItem content
[[items]]tokensintegerToken count
[[items]]kindstring (optional)ContextKind
[[items]]timestampdatetime (optional)UTC timestamp
[[items]]priorityinteger (optional)Numeric priority
[[items]]tagsarray of string (optional)Item tags
[[items]]futureRelevanceHintfloat (optional)Relevance hint
[[items]]pinnedboolean (optional)Whether item is pinned (default: false)
[[expected_output]]contentstringContent values in expected output order (ordered comparison)

Ordered Comparison

Pipeline output is compared as an ordered list — both the selected items and their presentation order must match. An implementation passes if the output items, in order, match the expected_output entries exactly.

Diagnostics Vectors

Diagnostics vectors extend pipeline vectors with an [expected.diagnostics] sub-table. They assert on which items were included or excluded, why, and aggregate counts. Diagnostics vectors are pipeline-level only (stage = "pipeline"). The [expected.diagnostics] table composes with [[expected_output]] — a single vector file can assert on both output order and diagnostic details simultaneously.

Schema

TableFieldTypeRequiredDescription
[[expected.diagnostics.included]]contentstringyesItem content (matches placed order)
[[expected.diagnostics.included]]score_approxfloatyesExpected score (epsilon tolerance)
[[expected.diagnostics.included]]inclusion_reasonstringyesReason: "Scored", "Pinned", "ZeroToken"
[[expected.diagnostics.excluded]]contentstringyesItem content (sorted by score desc)
[[expected.diagnostics.excluded]]score_approxfloatyesExpected score (epsilon tolerance)
[[expected.diagnostics.excluded]]exclusion_reasonstringyesReason discriminator: "BudgetExceeded", "Deduplicated", "QuotaCapExceeded"
[[expected.diagnostics.excluded]]item_tokensintegerconditionalToken count of excluded item (required for BudgetExceeded)
[[expected.diagnostics.excluded]]available_tokensintegerconditionalRemaining budget at exclusion (required for BudgetExceeded)
[[expected.diagnostics.excluded]]deduplicated_againststringconditionalContent of duplicate kept (required for Deduplicated)
[expected.diagnostics.summary]total_candidatesintegernoTotal items considered
[expected.diagnostics.summary]total_tokens_consideredintegernoSum of all candidate token counts

Ordering

included entries appear in placed order, matching the order of [[expected_output]] entries. excluded entries appear sorted by score descending.

Optionality

All three sub-tables (included, excluded, summary) are independently optional. A vector can assert on any combination: included items only, excluded items only, summary counts only, or any mix.

Compatibility

[expected.diagnostics] is a dotted-key sub-table under [expected] and does not conflict with [[expected_output]] (different key names). This is valid TOML 1.0. A single pipeline vector file may contain both [[expected_output]] (ordered output assertion) and [expected.diagnostics] (diagnostic assertion).

Example

[test]
name = "Pipeline diagnostics: BudgetExceeded exclusion reason"
stage = "pipeline"

# Expected values below are final. A live integration test against run_traced
# will be added in Phase 29.
#
# Budget: target=200, max=1000, reserve=0.
#
# Items:
#   "fits":    tokens=150, kind=Message, timestamp=Jun
#   "too-big": tokens=400, kind=Message, timestamp=Jan
#
# Score (RecencyScorer, 2 timestamped, denominator=1):
#   "too-big" (Jan): rank 0 → 0.0
#   "fits"    (Jun): rank 1 → 1.0
#
# Slice (Greedy, target=200):
#   Density sort: fits(1.0/150≈0.00667), too-big(0.0/400=0.0)
#   fits: 150 ≤ 200 → selected (remaining=50)
#   too-big: 400 > 50 → excluded (BudgetExceeded)
#
# Expected diagnostics:
#   included: fits (score=1.0, reason=Scored)
#   excluded: too-big (score=0.0, reason=BudgetExceeded, item_tokens=400, available=50)
#   summary: total_candidates=2, total_tokens_considered=550

[budget]
max_tokens = 1000
target_tokens = 200
output_reserve = 0

[config]
slicer = "greedy"
placer = "chronological"
deduplication = false

[[config.scorers]]
type = "recency"
weight = 1.0

[[items]]
content = "fits"
tokens = 150
kind = "Message"
timestamp = 2024-06-01T00:00:00Z

[[items]]
content = "too-big"
tokens = 400
kind = "Message"
timestamp = 2024-01-01T00:00:00Z

[[expected_output]]
content = "fits"

[expected.diagnostics.summary]
total_candidates = 2
total_tokens_considered = 550

[[expected.diagnostics.included]]
content = "fits"
score_approx = 1.0
inclusion_reason = "Scored"

[[expected.diagnostics.excluded]]
content = "too-big"
score_approx = 0.0
exclusion_reason = "BudgetExceeded"
item_tokens = 400
available_tokens = 50

Field Types

TypeTOML RepresentationNotes
string"text"UTF-8 string
integer42Signed 64-bit integer
float0.5IEEE 754 double-precision
booleantrue / false
datetime2024-06-15T12:00:00ZRFC 3339, always UTC
array of string["a", "b"]

Extensibility

Future versions of the conformance suite may add new fields to existing tables. Implementations SHOULD ignore unknown fields in test vector files rather than raising errors. This enables forward compatibility as the specification evolves.

Running the Suite

This chapter describes how to parse and execute conformance test vectors in any programming language.

General Approach

  1. Discover test vector files by scanning the conformance/required/ and conformance/optional/ directories.
  2. Parse each .toml file into a structured representation.
  3. Construct the appropriate inputs (items, budget, scorer/slicer/placer configuration) from the parsed data.
  4. Execute the algorithm under test.
  5. Compare the actual output against the expected output using the appropriate comparison mode (set or ordered).

Pseudocode: Conformance Runner

RUN-CONFORMANCE-SUITE(vectorDir):
    passed <- 0
    failed <- 0
    errors <- empty list

    files <- GLOB(vectorDir, "**/*.toml")

    for each file in files:
        vector <- PARSE-TOML(file)
        testName <- vector.test.name
        stage <- vector.test.stage

        try:
            if stage = "scoring":
                RUN-SCORING-TEST(vector)
            else if stage = "slicing":
                RUN-SLICING-TEST(vector)
            else if stage = "placing":
                RUN-PLACING-TEST(vector)
            else if stage = "pipeline":
                RUN-PIPELINE-TEST(vector)
            else:
                APPEND(errors, "Unknown stage: " + stage)
                continue

            passed <- passed + 1
        catch error:
            failed <- failed + 1
            APPEND(errors, testName + ": " + error.message)

    REPORT(passed, failed, errors)

Running a Scoring Test

RUN-SCORING-TEST(vector):
    items <- BUILD-ITEMS(vector.items)
    scorer <- BUILD-SCORER(vector.test.scorer, vector.config)
    epsilon <- vector.tolerance.score_epsilon or 1e-9

    for i <- 0 to length(vector.expected) - 1:
        expectedContent <- vector.expected[i].content
        expectedScore <- vector.expected[i].score_approx

        // Find the corresponding input item by content
        item <- FIND-BY-CONTENT(items, expectedContent)
        actualScore <- scorer.Score(item, items)

        if abs(actualScore - expectedScore) >= epsilon:
            ERROR("Score mismatch for '" + expectedContent + "': " +
                  "expected " + expectedScore + ", got " + actualScore +
                  " (epsilon=" + epsilon + ")")

Running a Slicing Test

RUN-SLICING-TEST(vector):
    scoredItems <- BUILD-SCORED-ITEMS(vector.scored_items)
    budget <- BUILD-BUDGET(vector.budget)
    slicer <- BUILD-SLICER(vector.test.slicer, vector.config)

    selected <- slicer.Slice(scoredItems, budget)
    actualContents <- SET(item.content for item in selected)
    expectedContents <- SET(vector.expected.selected_contents)

    if actualContents != expectedContents:
        ERROR("Selection mismatch: expected " + expectedContents +
              ", got " + actualContents)

Running a Placing Test

RUN-PLACING-TEST(vector):
    items <- BUILD-SCORED-ITEMS-FOR-PLACER(vector.items)
    placer <- BUILD-PLACER(vector.test.placer)

    placed <- placer.Place(items)
    actualContents <- [item.content for item in placed]
    expectedContents <- vector.expected.ordered_contents

    if actualContents != expectedContents:
        ERROR("Placement order mismatch: expected " + expectedContents +
              ", got " + actualContents)

Running a Pipeline Test

RUN-PIPELINE-TEST(vector):
    items <- BUILD-ITEMS(vector.items)
    budget <- BUILD-BUDGET(vector.budget)
    config <- BUILD-PIPELINE-CONFIG(vector.config)

    output <- PIPELINE-RUN(items, budget, config)
    actualContents <- [item.content for item in output]
    expectedContents <- [e.content for e in vector.expected_output]

    if actualContents != expectedContents:
        ERROR("Pipeline output mismatch: expected " + expectedContents +
              ", got " + actualContents)

Score Comparison

Score comparisons MUST use epsilon tolerance, never exact floating-point equality:

SCORES-MATCH(actual, expected, epsilon):
    return abs(actual - expected) < epsilon

The default epsilon is 1e-9. This tolerance accounts for:

  • IEEE 754 representation differences across platforms
  • Intermediate rounding differences in division operations
  • Language-specific floating-point arithmetic behavior

Conformance test vectors are designed so that correct implementations produce scores well within this tolerance. If an implementation fails a score comparison by a small margin (e.g., 1e-10), it is likely correct. If the difference is large (e.g., > 0.01), the algorithm implementation is likely incorrect.

Set vs. Ordered Comparison

StageComparison ModeRationale
ScoringPer-item epsilonScores are floating-point values
SlicingSet (unordered)Slicers select items but do not determine order
PlacingOrdered listPlacers determine the exact presentation order
PipelineOrdered listThe pipeline produces a fully ordered output

TOML Libraries

The following TOML parsing libraries are recommended for common implementation languages:

LanguageLibraryNotes
C#TomlynNuGet package
RusttomlCargo crate (serde-based)
Gogithub.com/BurntSushi/tomlStandard Go TOML library
Pythontomllib (stdlib)Built-in since Python 3.11
TypeScript@iarna/toml or smol-tomlnpm packages
Javacom.moandjiezana.toml:toml4jMaven artifact
SwiftTOMLDecoderSwift Package Manager
Kotlincc.ekblad.toml4kGradle/Maven

Tips for Implementors

  1. Start with scoring vectors. They test individual algorithms in isolation and are the easiest to debug.
  2. Verify your TOML parser handles datetimes. Some TOML libraries parse RFC 3339 timestamps into local time; ensure you compare temporal instants (UTC).
  3. Use reference identity for self-exclusion. FrequencyScorer and ScaledScorer use reference identity (is) to identify the current item in allItems. Ensure your implementation does not use structural equality for this check.
  4. Slicer output order does not matter. Only the set of selected items is tested. Do not fail a slicer test because items are in a different order than the expected list.
  5. Pipeline vectors are the final gate. If all scoring, slicing, and placing vectors pass but a pipeline vector fails, the issue is likely in stage integration (classify, deduplicate, sort, merge, overflow handling).

Cupel.Testing Vocabulary

Overview

The Cupel.Testing vocabulary is a language-agnostic set of named assertion patterns over SelectionReport. Each pattern has a precise semantic specification: what it asserts, which fields of SelectionReport it reads, the exact comparison operator used, and the exact structure of the error message emitted on failure. The vocabulary is a contract document — it defines the observable surface that conforming implementations must provide, without prescribing internal structure.

What Cupel.Testing is

  • A named vocabulary of 13 assertion patterns, each fully specified with no undefined qualifiers
  • A chain-based API: SelectionReport.Should() returns a SelectionReportAssertionChain; each assertion method returns this for fluent chaining; failures throw SelectionReportAssertionException with a structured message
  • The prerequisite for the Wollax.Cupel.Testing NuGet package implemented in M003 (requirement R021)
  • Language-agnostic: pattern semantics are defined at the field-access level; language bindings follow the field-access semantics of SelectionReport, IncludedItem, and ExcludedItem as specified in SelectionReport

What Cupel.Testing is not

  • Not a FluentAssertions dependency (D041): the chain plumbing is in-house (~100 lines); no dependency on FluentAssertions, ApprovalTests, or any third-party assertion framework
  • Not snapshot-based (D041): snapshot assertions are explicitly deferred. SelectionReport.excluded is sorted score-descending with an insertion-order tiebreak that is not observable from the report itself. Snapshot outputs are therefore non-deterministic across test runs unless the caller controls the complete input set and all scores are distinct. This precondition is not currently guaranteed by the spec.
  • Not an implementation: this chapter specifies the vocabulary; the implementation lives in M003
  • Not a runtime analytics module: BudgetUtilization(budget) and KindDiversity() are extension methods on SelectionReport that belong in Wollax.Cupel core (D045), not in Wollax.Cupel.Testing. The testing vocabulary may call these methods internally but does not define them.

Pre-decisions

The following four design choices are locked and apply uniformly across all 13 patterns. They are stated here once to prevent inconsistency in per-pattern specs.

PD-1: Predicate type — IncludedItem / ExcludedItem, not raw ContextItem

All predicate-bearing methods accept IncludedItem (for Inclusion-group methods) or ExcludedItem (for Exclusion-group methods) as the predicate parameter type. The predicate receives the full pair — item, score, and reason — not just the raw ContextItem.

Rationale: predicates over IncludedItem/ExcludedItem are strictly more powerful than predicates over ContextItem; callers can filter on score or reason as well as item fields. Implementations MAY provide convenience overloads that accept a ContextItem-based predicate for callers who only need to inspect content or kind, but such overloads are not part of this vocabulary spec.

PD-2: BudgetUtilization denominator — budget.MaxTokens

BudgetUtilization is defined as sum(included[i].item.tokens) / budget.MaxTokens.

The denominator is budget.MaxTokens (the hard token ceiling), not budget.TargetTokens.

Rationale: MaxTokens is the single authoritative ceiling callers control at construction time and the ceiling that matters for context-window safety. TargetTokens is a Slice-stage-internal soft target — it is not a public capacity metric exposed by ContextBudget and is not appropriate as a utilization denominator (D045, DI-3). Using TargetTokens as denominator would produce utilization values > 1.0 for reports where the pipeline filled the full max-token budget, which is misleading.

PD-3: Floating-point threshold comparisons — exact operator, no epsilon

Floating-point threshold comparisons use the exact operator: >= for lower-bound thresholds (e.g. HaveBudgetUtilizationAbove), <= for upper-bound. No default epsilon is baked into any method (D064).

Rationale: a baked-in epsilon hides real threshold violations and couples the spec to a particular floating-point implementation. Test authoring is the caller’s responsibility: callers must choose threshold values that avoid floating-point boundary cases (e.g. 0.799 instead of 0.8 when the computed ratio may drift by one ULP). The comparison operator is stated explicitly in each pattern spec.

PD-4: Chain plumbing — SelectionReport.Should(), SelectionReportAssertionChain, SelectionReportAssertionException

  • Entry point: SelectionReport.Should() — a method (or extension method) on SelectionReport that returns a SelectionReportAssertionChain wrapping the report
  • Chaining: each assertion method on SelectionReportAssertionChain returns this so calls can be chained fluently
  • Failure: on failure, each assertion method throws SelectionReportAssertionException (not InvalidOperationException, not AssertionException) with a structured message containing: assertion name, what was expected, and what was actually found
  • No side effects on success: assertion methods that pass do not modify any state and return this unchanged

Chain Plumbing

SelectionReportAssertionChain is the type returned by SelectionReport.Should(). It wraps a SelectionReport and exposes the 13 assertion methods listed in the Vocabulary section below.

Type shape

SelectionReportAssertionChain:
    report: SelectionReport          // the wrapped report; read-only after construction

    // Entry point
    SelectionReport.Should() → SelectionReportAssertionChain

    // Each assertion method returns `this` for chaining
    // On failure: throws SelectionReportAssertionException

    // Inclusion group (3 methods)
    IncludeItemWithKind(kind: ContextKind) → SelectionReportAssertionChain
    IncludeItemMatching(predicate: Func<IncludedItem, bool>) → SelectionReportAssertionChain
    IncludeExactlyNItemsWithKind(kind: ContextKind, n: int) → SelectionReportAssertionChain

    // Exclusion group (4 methods)
    ExcludeItemWithReason(reason: ExclusionReason) → SelectionReportAssertionChain
    ExcludeItemMatchingWithReason(predicate: Func<ContextItem, bool>, reason: ExclusionReason) → SelectionReportAssertionChain
    ExcludeItemWithBudgetDetails(predicate: Func<ContextItem, bool>, expectedItemTokens: int, expectedAvailableTokens: int) → SelectionReportAssertionChain
    HaveNoExclusionsForKind(kind: ContextKind) → SelectionReportAssertionChain

    // Aggregate group (2 methods)
    HaveAtLeastNExclusions(n: int) → SelectionReportAssertionChain
    ExcludedItemsAreSortedByScoreDescending() → SelectionReportAssertionChain

    // Budget group (1 method)
    HaveBudgetUtilizationAbove(threshold: double, budget: ContextBudget) → SelectionReportAssertionChain

    // Coverage group (1 method)
    HaveKindCoverageCount(n: int) → SelectionReportAssertionChain

    // Ordering group (2 methods)
    PlaceItemAtEdge(predicate: Func<IncludedItem, bool>) → SelectionReportAssertionChain
    PlaceTopNScoredAtEdges(n: int) → SelectionReportAssertionChain

Error message contract

SelectionReportAssertionException carries a structured message. Each pattern’s spec (written in T02 and T03) defines the exact message template. All messages must include:

  1. Assertion name — the method name as a label (e.g. "IncludeItemWithKind")
  2. What was expected — the caller’s assertion parameter(s) stated explicitly
  3. What was actually found — the actual field values from the report that caused the failure, with enough context (item counts, score values, kind lists) for the test author to diagnose without re-running under a debugger

No message should reference internal type names or implementation details (D032).

Implementation cost

The chain plumbing is approximately 100 lines of code. The vocabulary spec does not prescribe the internal structure — only the entry point, method signatures, return type, and failure mechanism.


Vocabulary

The Cupel.Testing vocabulary defines 13 named assertion patterns organized into five groups. Each pattern is fully specified in the sections below (T02 covers patterns 1–7; T03 covers patterns 8–13).

Pattern summary table

#GroupMethodOne-line description
1InclusionIncludeItemWithKind(ContextKind)At least one included item has the given Kind
2InclusionIncludeItemMatching(Func<IncludedItem, bool>)At least one included item satisfies the predicate
3InclusionIncludeExactlyNItemsWithKind(ContextKind, int n)Exactly N included items have the given Kind (N=0 is valid)
4ExclusionExcludeItemWithReason(ExclusionReason)At least one excluded item carries the given ExclusionReason
5ExclusionExcludeItemMatchingWithReason(Func<ContextItem, bool>, ExclusionReason)At least one excluded item satisfies the predicate and carries the given reason
6ExclusionExcludeItemWithBudgetDetails(Func<ContextItem, bool>, int, int)An excluded item matching the predicate has BudgetExceeded with the exact token counts
7ExclusionHaveNoExclusionsForKind(ContextKind)No excluded item has the given Kind
8AggregateHaveAtLeastNExclusions(int n)At least N items in the Excluded list
9AggregateExcludedItemsAreSortedByScoreDescending()Excluded list is sorted score-descending (conformance assertion)
10BudgetHaveBudgetUtilizationAbove(double threshold, ContextBudget)sum(included tokens) / budget.MaxTokens >= threshold
11CoverageHaveKindCoverageCount(int n)At least N distinct ContextKind values appear in the Included list
12OrderingPlaceItemAtEdge(Func<IncludedItem, bool>)An included item matching the predicate is at position 0 or position count−1
13OrderingPlaceTopNScoredAtEdges(int n)The N highest-scored included items occupy the N outermost positions

Per-pattern specifications are written in T02 (patterns 1–7) and T03 (patterns 8–13). Each specification includes: assertion semantics, predicate type, edge cases and tolerance, tie-breaking behavior, and error message format.


Inclusion Group

The Inclusion group contains 3 patterns that assert over SelectionReport.included. All predicates in this group operate over IncludedItem (per PD-1), giving callers access to item, score, and reason.

IncludeItemWithKind(ContextKind kind)

Assertion semantics: Asserts that Included.Any(i => i.Item.Kind == kind) — at least one item in the included list has Kind equal to kind.

Predicate type: No predicate parameter. The kind value is matched via direct field equality: i.item.kind == kind.

ContextKind.Any sentinel: ContextKind.Any is not a valid argument to this assertion. Pass a specific kind (e.g., Message, SystemPrompt, ToolOutput). Passing ContextKind.Any is implementation-defined behavior; a conforming implementation MAY throw ArgumentException or treat it as always-matching, but the vocabulary does not assign semantics to this case.

Edge cases:

  • Included is empty → assertion fails with count = 0.
  • Included contains items but none match kind → assertion fails with count = 0 for the given kind.

Tie-breaking: Not applicable. This is a pure existence check with no ordering dependency.

Error message format:

IncludeItemWithKind({kind}) failed: Included contained 0 items with Kind={kind}. Included had {count} items with kinds: [{actualKinds}].

Where {actualKinds} is a comma-separated list of distinct Kind values present in included (e.g., Message, ToolOutput). If included is empty, {actualKinds} is an empty string and {count} is 0.


IncludeItemMatching(Func<IncludedItem, bool> predicate)

Assertion semantics: Asserts that Included.Any(predicate) — at least one item in the included list satisfies the caller-supplied predicate.

Predicate type: IncludedItem (per PD-1). The predicate receives the full IncludedItem pair (fields: item, score, reason), not a raw ContextItem. This allows callers to filter on score, inclusion reason, or item content/kind from a single predicate.

Convenience overloads: Implementations MAY provide additional overloads that accept a Func<ContextItem, bool> predicate for callers who only need to inspect item content or kind. Such overloads are implementation-defined and not part of this vocabulary spec.

Edge cases:

  • Included is empty → assertion fails with count = 0, no items to show.
  • Included is non-empty but the predicate returns false for all items → assertion fails and includes a summary of included items (up to 5) in the error message to assist diagnosis.

Tie-breaking: Not applicable. Existence check only.

Error message format:

IncludeItemMatching failed: no item in Included matched the predicate. Included had {count} items.

Implementations SHOULD append a summary of up to 5 IncludedItem entries from included (showing kind, score, and reason for each) to aid diagnosis. The exact format of this optional appendix is implementation-defined, but no additional required fields are added to the base message above.


IncludeExactlyNItemsWithKind(ContextKind kind, int n)

Assertion semantics: Asserts that Included.Count(i => i.Item.Kind == kind) == n — the included list contains exactly n items with Kind equal to kind.

Predicate type: No predicate parameter. Count is computed by direct field equality: i.item.kind == kind.

N=0 semantics: n = 0 is a valid, well-defined argument. It asserts that no included item has the given kind. This is distinct from HaveNoExclusionsForKind (pattern 7): IncludeExactlyNItemsWithKind(kind, 0) asserts the kind is absent from included; HaveNoExclusionsForKind(kind) asserts the kind is absent from excluded. An item of a given kind could be absent from both lists (never a candidate).

Edge cases:

  • Included is empty and n = 0 → assertion passes.
  • Included is empty and n > 0 → assertion fails with actual = 0.
  • Included has matching items but count does not equal n → assertion fails showing both expected and actual counts.

Tie-breaking: Not applicable. Count comparison only.

Error message format:

IncludeExactlyNItemsWithKind({kind}, {n}) failed: expected {n} items with Kind={kind} in Included, but found {actual}. Included had {count} items total.

Where {actual} is the count of items in included with Kind == kind, and {count} is the total length of included.


Exclusion Group

The Exclusion group contains 4 patterns that assert over SelectionReport.excluded. Predicates in this group operate over ContextItem or ExcludedItem depending on the method — see each pattern spec for the exact predicate type. Reason matching uses variant discriminant comparison, not string equality.

ExcludeItemWithReason(ExclusionReason reason)

Assertion semantics: Asserts that Excluded.Any(e => e.Reason is <reason variant>) — at least one item in the excluded list carries an exclusion reason of the given variant.

Reason matching: Variant discriminant match. In Rust this is a pattern match on the enum variant (e.g., matches!(e.reason, ExclusionReason::BudgetExceeded { .. })). In .NET this is an equality comparison on the flat enum value (e.g., e.Reason == ExclusionReason.BudgetExceeded). This is not a string equality check on the reason name.

Reserved variants: All 8 ExclusionReason variants are valid arguments, including reserved variants (ScoredTooLow, QuotaCapExceeded, QuotaRequireDisplaced, Filtered). Reserved variants are valid arguments even if no built-in pipeline stage currently emits them — custom stage implementations may emit reserved variants, and test authors may assert against them.

Edge cases:

  • Excluded is empty → assertion fails with count = 0.
  • Excluded is non-empty but no item carries the given reason variant → assertion fails showing the actual reason variants present.

Tie-breaking: Not applicable. Existence check only.

Error message format:

ExcludeItemWithReason({reason}) failed: no excluded item had reason {reason}. Excluded had {count} items with reasons: [{reasonList}].

Where {reasonList} is a comma-separated list of distinct reason variant names present in excluded (e.g., BudgetExceeded, Deduplicated). If excluded is empty, {reasonList} is an empty string and {count} is 0.


ExcludeItemMatchingWithReason(Func<ContextItem, bool> predicate, ExclusionReason reason)

Assertion semantics: Asserts that Excluded.Any(e => predicate(e.Item) && e.Reason is <reason variant>) — at least one item in the excluded list satisfies both the content predicate and the reason discriminant match.

Predicate type: ContextItem (the predicate is over the raw item, not the full ExcludedItem). Callers use the predicate to filter by content or kind; the reason parameter provides the second filter. This split is deliberate: the predicate is the “which item?” filter, and the reason is the “why?” filter.

Reason matching: Variant discriminant match. Same rules as ExcludeItemWithReason (pattern 4).

Partial-match count: The error message distinguishes two failure modes: (a) the predicate matched no excluded items at all, and (b) the predicate matched one or more excluded items but none had the expected reason. This distinction is exposed via {predicateMatchCount} in the error message.

Edge cases:

  • Excluded is empty → assertion fails; predicateMatchCount = 0.
  • Predicate matches zero items in excluded → assertion fails; predicateMatchCount = 0.
  • Predicate matches one or more items but none have the given reason → assertion fails; error shows partial-match count and the actual reasons of the matched items.

Tie-breaking: Not applicable. Existence check only.

Error message format:

ExcludeItemMatchingWithReason(reason={reason}) failed: predicate matched {predicateMatchCount} excluded item(s) but none had reason {reason}. Matched items had reasons: [{actualReasons}].

Where {predicateMatchCount} is the count of excluded items for which predicate(e.item) returned true, and {actualReasons} is a comma-separated list of the distinct reason variant names of those matched items. If predicateMatchCount = 0, {actualReasons} is an empty string.


ExcludeItemWithBudgetDetails(Func<ContextItem, bool> predicate, int expectedItemTokens, int expectedAvailableTokens)

Assertion semantics: Asserts that there exists an excluded item e such that:

  1. predicate(e.item) is true
  2. e.reason is BudgetExceeded
  3. e.reason.item_tokens == expectedItemTokens (exact integer equality)
  4. e.reason.available_tokens == expectedAvailableTokens (exact integer equality)

Predicate type: ContextItem. The predicate identifies which item is being asserted about; the reason and token field checks are fixed by the assertion signature.

Token equality: Exact integer equality for both item_tokens and available_tokens. No tolerance. available_tokens is defined as effective_target − sum(sliced_item.tokens) at the moment of exclusion (D025).

Reserved variant note: This assertion is specific to the BudgetExceeded variant. It does not apply to other reason variants.

Edge cases:

  • No excluded item matches the predicate → assertion fails; report which predicate-matching items exist (if any) with their actual item_tokens and available_tokens.
  • Predicate matches an item but its reason is not BudgetExceeded → assertion fails; error shows actual reason.
  • Predicate matches a BudgetExceeded item but token values differ → assertion fails; error shows both expected and actual token values.

Tie-breaking: Not applicable. The first predicate-matching BudgetExceeded item is used for error reporting if multiple candidates exist.

Error message format:

ExcludeItemWithBudgetDetails failed: expected BudgetExceeded with item_tokens={eIT}, available_tokens={eAT}, but found item_tokens={aIT}, available_tokens={aAT}.

Where {eIT} = expectedItemTokens, {eAT} = expectedAvailableTokens, {aIT} = actual item_tokens on the matched item, {aAT} = actual available_tokens on the matched item. If no predicate-matching BudgetExceeded item exists, replace the but found clause with: but no matching item had reason BudgetExceeded (implementation-defined extended form permitted).

Language note: In .NET, ExclusionReason is a flat enum with no associated data. The item_tokens and available_tokens fields are not available on ExcludedItem.Reason — the .NET ExclusionReason enum carries only the variant discriminant (BudgetExceeded, Deduplicated, etc.) and no per-variant fields. .NET implementations of Wollax.Cupel.Testing MAY omit this assertion entirely or surface it differently — for example, as HaveExcludedItemWithBudgetExceeded(Func<ContextItem, bool> predicate) without the expectedItemTokens and expectedAvailableTokens parameters. If a .NET implementation omits the two token-detail parameters, the assertion degenerates to ExcludeItemMatchingWithReason(predicate, ExclusionReason.BudgetExceeded) (pattern 5). This difference is a consequence of the language-level representation choice and not a spec deviation.


HaveNoExclusionsForKind(ContextKind kind)

Assertion semantics: Asserts that Excluded.All(e => e.Item.Kind != kind) — no item in the excluded list has Kind equal to kind.

Predicate type: No predicate parameter. The kind value is matched via direct field equality: e.item.kind == kind.

Semantic distinction from IncludeExactlyNItemsWithKind(kind, 0): These two patterns test different lists:

  • HaveNoExclusionsForKind(kind) asserts the kind is absent from excluded.
  • IncludeExactlyNItemsWithKind(kind, 0) asserts the kind is absent from included. An item of a given kind can be absent from both lists (it was never a candidate in the pipeline run). The two assertions are not equivalent and are not interchangeable.

Edge cases:

  • Excluded is empty → assertion passes (vacuous truth: All over an empty set is true).
  • Excluded contains items of the given kind → assertion fails; error shows count and details of the first matching item.

Tie-breaking: Not applicable. The All predicate is evaluated over the full excluded list; the first matching item is used in the error message for diagnosis.

Error message format:

HaveNoExclusionsForKind({kind}) failed: found {count} excluded item(s) with Kind={kind}. First: score={score}, reason={reason}.

Where {count} is the total count of excluded items with Kind == kind, and {score} and {reason} are the score and reason variant name of the first such item in the excluded list (i.e., the highest-scored excluded item of that kind, since excluded is sorted score-descending).


Aggregate Counts Group

The Aggregate Counts group contains 2 patterns that assert over aggregate properties of SelectionReport.excluded — total count and ordering invariants.

HaveAtLeastNExclusions(int n)

Assertion semantics: Asserts that Excluded.Count >= n — the excluded list contains at least n items.

Predicate type: No predicate parameter. The comparison is over the total count of the excluded list.

N=0 semantics: n = 0 is a valid, well-defined argument. HaveAtLeastNExclusions(0) always passes unless the pipeline threw an exception (i.e., unless SelectionReport was never produced). There is no separate HaveNoExclusionsRequired() pattern — use HaveAtLeastNExclusions(0) instead. This is intentionally a no-op assertion useful for establishing “the pipeline ran without error” in minimal smoke tests.

Edge cases:

  • Excluded is empty and n = 0 → assertion passes.
  • Excluded is empty and n > 0 → assertion fails with actual = 0.
  • Excluded.Count >= n → assertion passes.

Tie-breaking: Not applicable. Count comparison only.

Error message format:

HaveAtLeastNExclusions({n}) failed: expected at least {n} excluded items, but Excluded had {actual}.

Where {actual} is Excluded.Count.


ExcludedItemsAreSortedByScoreDescending()

Assertion semantics: Asserts that for all 0 <= i < Excluded.Count - 1: Excluded[i].Score >= Excluded[i+1].Score. The excluded list must be sorted in non-increasing score order.

Conformance assertion: This assertion tests a behavioral invariant of a correct Cupel pipeline implementation. A pipeline that satisfies the specification (see SelectionReport, D019) will always produce an excluded list in score-descending order. This assertion exists to detect non-conforming implementations and to validate hand-constructed test fixtures.

Score comparison: >= (non-increasing). Equal scores are permitted between adjacent items.

Tiebreak caveat: The specification (D019) guarantees that among items with equal scores, insertion order is preserved (the item added earlier appears earlier in excluded). However, this insertion-order tiebreak is not assertable from the report alone — the report does not expose an insertion index or candidate sequence number. This assertion therefore checks score-descending order only. Callers who need to verify tiebreak behavior must control the complete input set and ensure all scores are distinct.

Edge cases:

  • Excluded is empty → assertion passes (vacuous truth: no adjacent pair to check).
  • Excluded has exactly one item → assertion passes.
  • Any adjacent pair violates non-increasing order → assertion fails at the first violating index.

Error message format:

ExcludedItemsAreSortedByScoreDescending failed: item at index {i+1} (score={si1}) is higher than item at index {i} (score={si}). Expected non-increasing scores.

Where {i} is the 0-based index of the first violating position (the item whose successor is higher-scored), {si} is Excluded[i].Score, and {si1} is Excluded[i+1].Score.


Budget Group

The Budget group contains 1 pattern that asserts the token utilization of the included set relative to the caller-supplied budget.

HaveBudgetUtilizationAbove(double threshold, ContextBudget budget)

Assertion semantics: Asserts that sum(Included[i].Item.Tokens) / budget.MaxTokens >= threshold.

Denominator: budget.MaxTokens — the hard token ceiling set at ContextBudget construction time. This is the single authoritative capacity ceiling for context-window safety.

budget.TargetTokens is explicitly NOT the denominator. TargetTokens is a Slice-stage-internal soft target; it is not a public capacity metric exposed by ContextBudget. Using TargetTokens as denominator would produce utilization values > 1.0 for reports where the pipeline filled the full max-token budget, which is misleading. (PD-2, D045, DI-3.)

Comparison operator: Exact >= with no epsilon (PD-3, D064). Floating-point edge cases are test authoring responsibility — callers must choose threshold values that avoid boundary drift.

budget.MaxTokens == 0: This is a pipeline error. ContextBudget validates MaxTokens > 0 at construction time; a ContextBudget with MaxTokens == 0 cannot be produced by a conforming implementation. This assertion does not need to handle division-by-zero.

Edge cases:

  • Included is empty → sum(Included[i].Item.Tokens) = 0; utilization = 0.0. Assertion fails unless threshold <= 0.0.
  • threshold = 0.0 → assertion passes whenever Included is non-empty (utilization > 0.0) and also when Included is empty (0.0 >= 0.0 is true). Valid usage.
  • threshold > 1.0 → valid argument; the assertion will fail unless the sum of included tokens exceeds budget.MaxTokens (which is an over-budget condition indicating a pipeline defect).

Tie-breaking: Not applicable. Arithmetic comparison only.

Error message format:

HaveBudgetUtilizationAbove({threshold}) failed: computed utilization was {actual:.6f} (includedTokens={includedTokens}, budget.MaxTokens={maxTokens}).

Where {actual} is the computed utilization value formatted to 6 decimal places, {includedTokens} is sum(Included[i].Item.Tokens), and {maxTokens} is budget.MaxTokens.


Kind Coverage Group

The Kind Coverage group contains 1 pattern that asserts over the diversity of ContextKind values in the included set.

HaveKindCoverageCount(int n)

Assertion semantics: Asserts that Included.Select(i => i.Item.Kind).Distinct().Count() >= n — the included list contains at least n distinct ContextKind values.

Predicate type: No predicate parameter. The kind set is computed as { i.item.kind | i ∈ included }.

No ordering dependency: This assertion is purely over the set of distinct kinds in included; it does not depend on list order or score values.

N=0 semantics: n = 0 is a valid argument; the assertion always passes (an empty set has 0 distinct kinds, which satisfies >= 0). There is no distinct HaveAtLeastOneKind() pattern — use HaveKindCoverageCount(1).

Edge cases:

  • Included is empty → distinct kind count = 0. Assertion fails unless n = 0.
  • All included items have the same kind → distinct count = 1.

Tie-breaking: Not applicable. Set cardinality comparison only.

Error message format:

HaveKindCoverageCount({n}) failed: expected at least {n} distinct ContextKind values in Included, but found {actual}: [{actualKinds}].

Where {actual} is the count of distinct kind values, and {actualKinds} is a comma-separated list of those distinct values (e.g., Message, SystemPrompt). If Included is empty, {actual} is 0 and {actualKinds} is an empty string.


Conformance Assertions Group

Conformance assertions verify behavioral invariants that a correct Cupel pipeline implementation must always satisfy. They are distinct from functional assertions (which verify caller intent) — a conformance assertion failing indicates either a non-conforming pipeline or a hand-constructed test fixture with invalid ordering.

The ExcludedItemsAreSortedByScoreDescending() pattern (pattern 9 above) is the primary conformance assertion in this vocabulary. It is documented in the Aggregate Counts Group because its method name groups naturally with the other aggregate exclusion-count assertions.


Ordering Group

The Ordering group contains 2 patterns that assert over the positions of items in SelectionReport.included. Both patterns carry a Placer dependency caveat — the meaning of positions in included is determined by the Placer used to produce the report.

Placer dependency caveat: These assertions are only meaningful when the caller knows the Placer’s ordering contract. For UShapedPlacer, position 0 holds the highest-scored item and position count−1 holds the second-highest. For other Placers, consult the Placer spec. Do not use these assertions against output from an unknown or unspecified Placer.

PlaceItemAtEdge(Func<IncludedItem, bool> predicate)

Assertion semantics: Asserts that predicate(Included[0]) || predicate(Included[Included.Count - 1]) — at least one of the two edge positions (position 0 and position count−1) contains an item that satisfies the predicate. “Edge” means exactly position 0 (first) or position Included.Count − 1 (last). No other positions are “edge” positions.

Predicate type: IncludedItem (per PD-1). The predicate receives the full IncludedItem pair (item, score, reason).

“Edge” definition: Position 0 (first) OR position Included.Count − 1 (last). Nothing more. In particular: if Included has only 1 item, position 0 and position count−1 are the same item; a predicate matching that single item satisfies the assertion.

Tie-score clarification: If multiple items share the same score as an edge item, the assertion passes only if the item satisfying the predicate physically occupies position 0 or position count−1. The assertion does not pass merely because the item has the same score as an edge item while being at a non-edge position. Position is the criterion, not score.

Empty Included: Included being empty is a degenerate case. An empty list has no edge positions; the assertion fails because no item can satisfy the predicate. The “item not found” error message variant applies.

Edge cases:

  • Included is empty → assertion fails; “no item in Included matched the predicate” error variant.
  • Predicate matches no item in Included → assertion fails; “no item in Included matched the predicate” error variant.
  • Predicate matches one or more items, but none are at position 0 or count−1 → assertion fails; “item matching predicate was at index {actual}” error variant, reporting the index of the first matching item.

Placer dependency caveat: This assertion is only meaningful when the caller knows the Placer’s ordering contract. For UShapedPlacer, position 0 holds the highest-scored item and position count−1 holds the second-highest. For other Placers, consult the Placer spec. Do not use this assertion against output from an unknown or unspecified Placer.

Error message format (item found at wrong index):

PlaceItemAtEdge failed: item matching predicate was at index {actual} (not at edge). Edge positions: 0 and {last}. Included had {count} items.

Where {actual} is the 0-based index of the first item in Included satisfying the predicate, {last} is Included.Count − 1, and {count} is Included.Count.

Error message format (item not in Included):

PlaceItemAtEdge failed: no item in Included matched the predicate.

PlaceTopNScoredAtEdges(int n)

Assertion semantics: Asserts that the n items with the highest score values in Included occupy the n outermost positions (edges) of the list. The edge position mapping for n items is: position 0, position count−1, position 1, position count−2, … (alternating from the two ends inward).

Step-by-step evaluation:

  1. Sort Included by score descending to identify the top-n items.
  2. Enumerate the first n edge positions: 0, count−1, 1, count−2, …
  3. Verify that each of the top-n items (by score) occupies one of those edge positions.

Tie-score handling: If multiple items share the score of the N-th top item (i.e., there is a tie at the boundary between the top-N set and the remainder), then any item with that tied score is a valid occupant of the N-th edge position. The assertion does not require a specific tied item to be at that position; it only requires that the N-th edge position is occupied by an item with a score ≥ the N-th top score.

Predicate type: No predicate parameter. Score comparison is over IncludedItem.Score values.

n = 0: Valid argument; assertion always passes (zero items to check, zero edge positions to verify).

n > Included.Count: Assertion fails — there are not enough items in Included to fill n edge positions. The error message should note the mismatch between n and Included.Count.

Edge cases:

  • Included is empty and n = 0 → assertion passes.
  • Included is empty and n > 0 → assertion fails; no items to place at edges.
  • n = Included.Count → all items must be at “edge” positions, meaning the assertion checks that the full included list is sorted by edge-in order, not necessarily that it is sorted by score directly.

Placer dependency caveat: This assertion is only meaningful when the caller knows the Placer’s ordering contract. For UShapedPlacer, position 0 holds the highest-scored item and position count−1 holds the second-highest, with subsequent positions filling inward. For other Placers, consult the Placer spec. Do not use this assertion against output from an unknown or unspecified Placer.

Error message format:

PlaceTopNScoredAtEdges({n}) failed: {failCount} of the top-{n} scored items were not at expected edge positions. Top-{n} items (by score): [{topItems}]. Expected edge positions: [{edgePositions}].

Where {failCount} is the count of top-n items not occupying an expected edge position, {topItems} is a list of the top-n items formatted as (kind={kind}, score={score}, idx={actualIndex}), and {edgePositions} is the ordered list of expected edge position indices (e.g., 0, 5, 1, 4 for n=4 with count=6).


Notes

D041: Snapshot assertions deferred

Snapshot-based assertions over SelectionReport are explicitly deferred. The excluded list is sorted score-descending with an insertion-order tiebreak (D019): among items with equal scores, the item added earlier to the candidate set appears earlier in excluded. This tiebreak is an implementation detail — it is not observable from the SelectionReport alone because the report does not expose a candidate insertion index or sequence number.

Snapshot outputs are therefore non-deterministic across test runs unless the test author controls the complete input set AND all scores are distinct. This precondition is not currently guaranteed by the spec. Until SelectionReport ordering stability for ties is formally guaranteed and surfaced, snapshot assertions are not part of the Cupel.Testing vocabulary.

This deferral applies to any assertion of the form “the report serializes to exactly this JSON/string”. Per-field assertions (the 13 named patterns) are unaffected.

TotalTokensConsidered is not a utilization metric

TotalTokensConsidered (if surfaced as a SelectionReport field) is a candidate-set volume metric:

TotalTokensConsidered = sum(included[i].item.tokens) + sum(excluded[i].item.tokens)

It measures the total token volume of all items that entered the pipeline’s selection decision — it is not a utilization metric. Using TotalTokensConsidered as a denominator for utilization assertions would produce values far below 1.0 in typical runs (where many candidates are excluded) and would misrepresent budget efficiency.

For utilization assertions, use HaveBudgetUtilizationAbove(threshold, budget) (pattern 10), which divides by budget.MaxTokens — the authoritative capacity ceiling (PD-2, D045).

SelectionReportAssertionException should be a dedicated type

SelectionReportAssertionException must be a dedicated exception type — not InvalidOperationException, not AssertionException from any third-party framework, and not a raw Exception. (PD-4.)

Rationale: test frameworks (xUnit, NUnit, MSTest) recognize well-known assertion exception types and format them distinctly in test output. Using a dedicated type allows test runners to display Cupel assertion failures with the structured message they carry, rather than reporting them as unexpected system exceptions. It also allows callers to catch (SelectionReportAssertionException) in test utilities without accidentally catching unrelated InvalidOperationException instances.

The type must carry at minimum: the structured failure message (as .Message). Implementations MAY add structured properties (e.g., AssertionName, Expected, Actual) for programmatic inspection, but these are implementation-defined and not part of the vocabulary spec.

Changelog

All notable changes to the Cupel Specification are documented in this file.

The format is based on Keep a Changelog.

[1.3.0] — 2026-03-23

Added

  • Scorers: DecayScorer with Exponential, Step, and Window curve factories; MetadataTrustScorer for caller-provided cupel:trust metadata passthrough
  • Slicers: CountQuotaSlice decorator slicer enforcing absolute item-count require/cap per ContextKind, with ScarcityBehavior (Degrade/Throw) and CountRequirementShortfall reporting
  • Analytics: Budget simulation extension methods on CupelPipeline (.NET only): GetMarginalItems(items, budget, slackTokens) and FindMinBudgetFor(items, targetItem, searchCeiling) with reference-equality diff semantics and binary search
  • Analytics: BudgetUtilization(budget), KindDiversity(), and TimestampCoverage() extension methods on SelectionReport (both .NET and Rust)
  • Testing: Wollax.Cupel.Testing NuGet package with SelectionReport.Should() assertion chains (13 assertion patterns from the testing vocabulary spec)
  • Diagnostics: Wollax.Cupel.Diagnostics.OpenTelemetry companion package bridging Cupel pipelines to OpenTelemetry ActivitySource with three verbosity tiers (StageOnly, StageAndExclusions, Full)

Changed

  • GreedySlice: Deterministic tie-break contract explicitly specified — equal-density items are ordered by original index ascending. This was implicit behavior in 1.0.0 but is now a spec-committed contract that budget simulation depends on.

Specification Decisions

  • DecayScorer requires caller-injected TimeProvider with no default — makes time dependency visible and enables deterministic testing (D042)
  • DecayScorer is not Clone in Rust; stores Box<dyn TimeProvider + Send + Sync> (D047)
  • MetadataTrustScorer in .NET accepts both string and double for cupel:trust metadata value (D059)
  • CountQuotaSlice rejects KnapsackSlice as inner slicer at construction time; defers CountConstrainedKnapsackSlice to a future release
  • Budget simulation API scoped to .NET in v1.3; Rust parity deferred to a future milestone
  • SweepBudget (exhaustive budget sweep) assigned to Smelt project, not Cupel

[1.0.0] — 2026-03-14

Initial specification release.

Added

  • Data Model: ContextItem, ContextBudget, and all enumerations (ContextKind, OverflowStrategy, ExclusionReason)
  • Pipeline: 6-stage fixed pipeline (Classify, Score, Deduplicate, Sort, Slice, Place)
  • Scorers: RecencyScorer, PriorityScorer, KindScorer, TagScorer, FrequencyScorer, ReflexiveScorer, CompositeScorer, ScaledScorer
  • Slicers: GreedySlice, KnapsackSlice, QuotaSlice
  • Placers: ChronologicalPlacer, UShapedPlacer
  • Conformance suite: 37 TOML test vectors (28 required, 9 optional) covering all algorithms and edge cases
    • 13 required scoring vectors (all 8 scorer types)
    • 6 required slicing vectors (GreedySlice, KnapsackSlice, QuotaSlice)
    • 4 required placing vectors (ChronologicalPlacer, UShapedPlacer)
    • 5 required pipeline vectors (multiple slicer/placer combinations, pinned items)
    • 4 optional scoring vectors (edge cases)
    • 1 optional slicing vector (empty input)
    • 4 optional pipeline vectors (empty input, all-pinned, deduplication, overflow)

Specification Decisions

  • Epsilon tolerance (1e-9) for all floating-point score comparisons (P2)
  • Byte-exact ordinal comparison for content deduplication (P3)
  • Case-insensitive ASCII case folding for ContextKind comparison (P4)
  • KnapsackSlice discretization parameters are implementation-defined; conformance tests use score differences >= 0.01 (P5)
  • QuotaSlice budget distribution uses floor truncation for all percentage-to-token conversions (P6)
  • Stable sort mandated for all sorting operations with index-based tiebreaking (P1)