Introduction
Cupel Specification Version 1.0.0
This document is the language-agnostic specification for the Cupel context selection algorithm. Cupel selects, scores, and orders context items for inclusion in a language model’s context window, subject to a token budget.
Purpose
Large language model applications must decide which pieces of context (messages, documents, tool outputs, memories) to include in a finite context window. Cupel defines a deterministic pipeline that takes a set of candidate context items and a token budget, and produces an ordered list of selected items that fits within the budget.
This specification enables interoperable implementations across programming languages. An implementation that conforms to this specification will produce the same selected items in the same order as any other conforming implementation, given identical inputs.
Scope
This specification defines:
- The data model: ContextItem, ContextBudget, ScoredItem, and supporting enumerations
- The pipeline: a fixed 6-stage transformation (Classify, Score, Deduplicate, Sort, Slice, Place)
- All scorer algorithms: RecencyScorer, PriorityScorer, KindScorer, TagScorer, FrequencyScorer, ReflexiveScorer, CompositeScorer, ScaledScorer
- All slicer strategies: GreedySlice, KnapsackSlice, QuotaSlice
- All placer strategies: ChronologicalPlacer, UShapedPlacer
- Overflow handling: Throw, Truncate, Proceed strategies
This specification does not define:
- Streaming or asynchronous pipeline execution (implementation-specific)
- Diagnostics, tracing, or observability infrastructure
- Dependency injection or builder APIs
- Serialization formats (JSON, etc.)
- Named policies or preset configurations
- Tokenizer implementations (the caller provides token counts)
Conformance Model
A conforming implementation must exhibit behavioral equivalence: given the same input items, budget, scorer, slicer, and placer configuration, a conforming implementation must select the same items in the same order as the reference behavior defined in this specification.
Behavioral equivalence does not require bit-exact floating-point scores. Intermediate score values may differ due to floating-point evaluation order, provided the final selection and ordering are identical. Conformance tests compare selected item sets and their ordering, not exact score values. Where individual scores are verified (e.g., per-stage scorer tests), an epsilon tolerance of 1e-9 is used.
See Conformance Levels for the full conformance requirements.
Numeric Precision
All scoring computations MUST use IEEE 754 64-bit double-precision floating-point arithmetic. This includes:
- Individual scorer outputs
- Composite score aggregation (weighted averages)
- Score-based comparisons in sorting, deduplication, and slicing
- Density calculations in slicer algorithms
Implementations MUST NOT use 32-bit floats, fixed-point, or arbitrary-precision arithmetic for scoring. Integer arithmetic is permitted for token counts and budget calculations.
Notation Conventions
Algorithms in this specification use CLRS-style pseudocode with the following conventions:
| Convention | Meaning |
|---|---|
| bold lowercase | Keywords: for, if, else, return, while |
monospace | Variables, field names, function names |
<- | Assignment (not =, to avoid confusion with equality) |
= | Equality comparison |
[i] | 0-based array/list indexing |
// | Comments |
length(x) | Number of elements in array or list x |
Pipeline Overview
The Cupel pipeline is a fixed sequence of six stages. Every execution follows this order; stages cannot be reordered or skipped.
flowchart LR
A[Candidate Items] --> B[Classify]
B --> C[Score]
C --> D[Deduplicate]
D --> E[Sort]
E --> F[Slice]
F --> G[Place]
G --> H[Ordered Context Window]
| Stage | Input | Output | Purpose |
|---|---|---|---|
| Classify | Candidate items | Pinned list + Scoreable list | Partition items; exclude invalid |
| Score | Scoreable items | Scored items | Compute relevance scores |
| Deduplicate | Scored items | Unique scored items | Remove duplicate content |
| Sort | Unique scored items | Sorted scored items | Order by score descending |
| Slice | Sorted scored items | Budget-fitting items | Select items within token budget |
| Place | Sliced + pinned items | Ordered items | Determine final presentation order |
The pipeline operates on the principle of ordinal-only scoring: scorers assign relevance scores (rank), slicers select items within budget (drop), and placers determine presentation order (position). Each concern is strictly separated.
Data Model
The Cupel data model consists of three primary types and three supporting enumerations. These types flow through every stage of the pipeline.
Primary Types
-
ContextItem — An immutable record representing a single piece of context (a message, document, tool output, etc.). Every scorer, slicer, and placer operates on ContextItem instances.
-
ContextBudget — The budget constraint model that controls how many tokens the pipeline can select. Validated at construction time so that no invalid budget can exist at runtime.
-
ScoredItem — A pair of (ContextItem, Score) produced by the scoring stage and consumed by subsequent pipeline stages.
Enumerations
-
ContextKind — An extensible string enumeration classifying the type of a context item (Message, Document, ToolOutput, Memory, SystemPrompt).
-
ContextSource — An extensible string enumeration identifying the origin of a context item (Chat, Tool, Rag).
-
OverflowStrategy — A closed enumeration controlling behavior when selected items exceed the token budget (Throw, Truncate, Proceed).
Immutability
All data model types are immutable. Once constructed, a ContextItem or ContextBudget cannot be modified. Pipeline stages produce new collections rather than mutating inputs. This invariant simplifies reasoning about pipeline behavior and enables safe concurrent access.
ContextItem
A ContextItem is an immutable record representing a single piece of context in the pipeline. Every scorer, slicer, and placer operates on ContextItem instances.
Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
content | string | Yes | — | The textual content of the item. Non-null and non-empty. |
tokens | integer | Yes | — | The token count for this context item, as provided by the caller. Items with negative token counts are excluded during classification. |
kind | ContextKind | No | "Message" | The kind of context item. Used by KindScorer and QuotaSlice. |
source | ContextSource | No | "Chat" | The origin of this context item. |
priority | integer or null | No | null | Optional priority override. Higher values indicate greater importance. Used by PriorityScorer. |
tags | list of strings | No | empty list | Descriptive tags for filtering and scoring. May be empty. Tag comparison is case-insensitive (ASCII fold). Used by TagScorer and FrequencyScorer. |
metadata | map of string to any | No | empty map | Arbitrary key-value metadata. Opaque to the pipeline — not read or modified by any pipeline stage. Preserved on output for caller use. |
timestamp | datetime (UTC) or null | No | null | When this context item was created or observed. Used by RecencyScorer and ChronologicalPlacer. |
futureRelevanceHint | float64 or null | No | null | Hint for future relevance scoring, conventionally in the range [0.0, 1.0]. Used by ReflexiveScorer. |
pinned | boolean | No | false | Whether this item is pinned. Pinned items bypass scoring and slicing — they are always included in the output regardless of score or budget. |
originalTokens | integer or null | No | null | The original token count before any external summarization or truncation. Not used by the pipeline; preserved for caller diagnostics. |
Constraints
-
contentMUST be a non-null, non-empty string. Implementations SHOULD reject construction of a ContextItem with null or empty content. -
tokensis caller-provided. The pipeline does not tokenize content; it trusts the caller’s token count. The pipeline excludes items withtokens < 0during classification (see Stage 1: Classify). -
tagsuses case-insensitive comparison. When comparing tags (e.g., in TagScorer or FrequencyScorer), implementations MUST use case-insensitive ASCII comparison."Important"and"important"are considered equal. -
metadatais opaque. The pipeline MUST NOT read, interpret, or modify metadata values. Implementations MUST preserve metadata through the pipeline so that selected items retain their original metadata on output. -
timestamprepresents a point in time with UTC semantics. Implementations may use any datetime representation that preserves UTC instant precision (e.g., ISO 8601 with timezone, Unix epoch milliseconds). Comparison is by temporal ordering. -
Immutability. Once constructed, a ContextItem MUST NOT be modified. Pipeline stages MUST NOT mutate any field of any ContextItem they receive.
ContextBudget
A ContextBudget defines the token budget constraints that control how much context the pipeline can select. All fields are validated at construction time — no invalid budget can exist at runtime.
Fields
| Field | Type | Required | Default | Constraints |
|---|---|---|---|---|
maxTokens | integer | Yes | — | >= 0. Hard ceiling: the model’s context window size. |
targetTokens | integer | Yes | — | >= 0, <= maxTokens. Soft goal: the slicer aims for this token count. |
outputReserve | integer | No | 0 | >= 0, <= maxTokens. Tokens reserved for model output generation, subtracted from available budget. |
reservedSlots | map of ContextKind to integer | No | empty map | Minimum guaranteed items per kind. Each value >= 0. Used by QuotaSlice. |
estimationSafetyMarginPercent | float64 | No | 0.0 | >= 0.0, <= 100.0. Percentage buffer for token estimation error. |
Validation Rules
A conforming implementation MUST enforce these validation rules at construction time and reject invalid budgets:
maxTokens >= 0— Negative maximum tokens are invalid.targetTokens >= 0— Negative target tokens are invalid.targetTokens <= maxTokens— The soft target cannot exceed the hard ceiling.outputReserve >= 0— Negative output reserve is invalid.outputReserve <= maxTokens— The output reserve cannot exceed the context window.estimationSafetyMarginPercent >= 0.0 AND <= 100.0— Must be a valid percentage.- Each value in
reservedSlots>= 0 — Negative slot reservations are invalid.
Effective Budget
The pipeline computes an effective budget for the slicing stage by subtracting tokens already committed and applying any configured safety margin:
reservedTokens = sum of all values in reservedSlots
effectiveMax = max(0, maxTokens - outputReserve - pinnedTokens - reservedTokens)
effectiveTarget = max(0, targetTokens - pinnedTokens - reservedTokens)
effectiveTarget = min(effectiveTarget, effectiveMax)
if estimationSafetyMarginPercent > 0:
multiplier = 1.0 - estimationSafetyMarginPercent / 100.0
effectiveMax = floor(effectiveMax * multiplier)
effectiveTarget = floor(effectiveTarget * multiplier)
effectiveTarget = min(effectiveTarget, effectiveMax)
Where:
pinnedTokensis the sum oftokensfor all pinned items.reservedTokensis the sum of all values inreservedSlots, subtracted alongsideoutputReserveandpinnedTokensto reserve capacity for per-kind guarantees.- The safety margin is applied after all subtractions as a multiplicative reduction. Both
effectiveMaxandeffectiveTargetusefloor(integer truncation toward zero) when converting from the floating-point product.
See Stage 5: Slice for the full pseudocode.
Semantics
-
maxTokensis the absolute ceiling. The total tokens of all selected items (pinned + sliced) MUST NOT exceedmaxTokens - outputReserve, except when pinned items alone exceed this value (which is an error reported during classification). -
targetTokensis the soft goal. The slicer aims to select items whose total tokens are at mosttargetTokens. The pipeline checks for overflow againsttargetTokensafter merging pinned and sliced items. -
outputReservecarves out tokens for the model’s response. It reduces the effective budget available for context items. -
reservedSlotsguarantees minimum representation per ContextKind. The sum of reserved slot values is subtracted from the effective budget during pipeline budget computation, reducing the token ceiling available for non-reserved items. Additionally, QuotaSlice uses reserved slots to guarantee minimum representation per kind. -
estimationSafetyMarginPercentprovides a buffer for callers whose token counts are estimates rather than exact. It is applied as a multiplicative reduction to the effective budget after reserved slots and other subtractions. A value of 10.0 reduces the effective budget to 90% of its post-subtraction value.
Enumerations
ContextKind
ContextKind is an extensible string enumeration that classifies the type of a context item. Implementations MUST support arbitrary string values beyond the well-known set.
Well-Known Values
| Value | Description |
|---|---|
"Message" | A conversational message (default) |
"Document" | A document or file content |
"ToolOutput" | Output from a tool invocation |
"Memory" | A stored memory or fact |
"SystemPrompt" | A system-level instruction |
Comparison Semantics
ContextKind comparison MUST be case-insensitive using ASCII case folding. The following are all considered equal:
"Message","message","MESSAGE","mEsSaGe"
This case-insensitivity applies everywhere ContextKind is compared:
- KindScorer weight lookups
- QuotaSlice kind partitioning
- ContextBudget
reservedSlotskey lookups
Construction
A ContextKind value MUST be a non-null, non-whitespace-only string. Implementations SHOULD reject empty or whitespace-only values at construction time.
ContextSource
ContextSource is an extensible string enumeration that identifies the origin of a context item. Implementations MUST support arbitrary string values beyond the well-known set.
Well-Known Values
| Value | Description |
|---|---|
"Chat" | From a user chat interaction (default) |
"Tool" | From a tool or function call |
"Rag" | From a retrieval-augmented generation source |
Comparison Semantics
ContextSource comparison MUST be case-insensitive using ASCII case folding. The following are all considered equal:
"Chat","chat","CHAT"
Construction
A ContextSource value MUST be a non-null, non-whitespace-only string.
OverflowStrategy
OverflowStrategy is a closed enumeration (not extensible) that controls pipeline behavior when selected items exceed the token budget after merging pinned and sliced items.
Values
| Value | Behavior |
|---|---|
Throw | Raise an error (exception) when selected items exceed targetTokens. This is the default strategy. |
Truncate | Remove lowest-priority items from the selection until the total fits within targetTokens. Pinned items are never removed by truncation. Items are removed from the tail of the merged list (lowest-scored non-pinned items first). |
Proceed | Accept the over-budget selection and report the overflow to an observer. No items are removed. |
Overflow Detection
Overflow is detected after the Place stage merges pinned items (assigned score 1.0) with sliced items. The total tokens of the merged set are compared against targetTokens:
mergedTokens = sum of tokens for all items in merged set
if mergedTokens > targetTokens:
apply OverflowStrategy
See Stage 6: Place for the full overflow handling algorithm.
ScoredItem
ScoredItem is a value type (pair/tuple) that associates a ContextItem with its computed relevance score.
Fields
| Field | Type | Description |
|---|---|---|
item | ContextItem | The context item |
score | float64 | The computed relevance score, conventionally in the range [0.0, 1.0] |
Semantics
- The
scorefield is an IEEE 754 64-bit double. - Scores are conventionally in [0.0, 1.0] but this is not enforced by the type. Individual scorers may produce values outside this range (e.g., KindScorer with custom weights).
- ScoredItem is produced by the Score stage and consumed by Deduplicate, Sort, Slice, and Place.
- Pinned items are assigned a score of 1.0 when merged into the scored item list during the Place stage.
Pipeline
The Cupel pipeline is a fixed 6-stage transformation that takes a set of candidate context items and a token budget, and produces an ordered list of selected items that fits within the budget.
Invariants
-
Fixed stage order. The pipeline always executes stages in this order: Classify, Score, Deduplicate, Sort, Slice, Place. Stages cannot be reordered, skipped, or inserted between.
-
Ordinal-only scoring. Scorers assign relevance scores (rank). Slicers select items within budget (drop). Placers determine presentation order (position). Each concern is strictly separated — a scorer never drops items, a slicer never reorders, a placer never scores.
-
Immutable items. No stage modifies the ContextItem instances it receives. Stages produce new collections; they never mutate inputs.
Data Flow
flowchart TD
Input["Candidate Items\n(list of ContextItem)"] --> Classify
Classify --> |"pinned items\n(list of ContextItem)"| Place
Classify --> |"scoreable items\n(list of ContextItem)"| Score
Score --> |"scored items\n(list of ScoredItem)"| Deduplicate
Deduplicate --> |"unique scored items\n(list of ScoredItem)"| Sort
Sort --> |"sorted scored items\n(list of ScoredItem)"| Slice
Slice --> |"budget-fitting items\n(list of ContextItem)"| Place
Place --> Output["Ordered Context Window\n(list of ContextItem)"]
Stage Summary
| # | Stage | Input | Output | Key Behavior |
|---|---|---|---|---|
| 1 | Classify | Candidate items | Pinned + Scoreable lists | Partition; exclude negative-token items |
| 2 | Score | Scoreable items | Scored items | Invoke scorer; produce (item, score) pairs |
| 3 | Deduplicate | Scored items | Unique scored items | Remove duplicate content (byte-exact) |
| 4 | Sort | Unique scored items | Sorted scored items | Stable sort by score descending |
| 5 | Slice | Sorted scored items | Budget-fitting items | Select items within effective token budget |
| 6 | Place | Sliced + Pinned items | Ordered items | Merge, handle overflow, determine final order |
The pipeline operates on the principle of ordinal-only scoring: scorers assign relevance scores (rank), slicers select items within budget (drop), and placers determine presentation order (position). Each concern is strictly separated.
Error Conditions
The pipeline may raise errors in two situations:
-
Pinned items exceed available budget. If the total tokens of pinned items exceeds
maxTokens - outputReserve, the pipeline MUST raise an error during the Classify stage. This is not recoverable — the caller must reduce pinned items or increase the budget. -
Overflow after merge. If the total tokens of merged items (pinned + sliced) exceeds
targetTokens, the pipeline applies the configured OverflowStrategy. With theThrowstrategy (default), this raises an error.
Stage 1: Classify
The Classify stage partitions input items into two groups — pinned items that bypass scoring and slicing, and scoreable items that enter the scoring pipeline — while excluding invalid items.
Overview
Classification is the entry point of the pipeline. It performs three tasks:
- Exclude invalid items. Items with
tokens < 0are excluded with reasonNegativeTokens. These items do not appear in any subsequent stage or in the output. - Partition by pinned status. Items with
pinned = trueare placed in the pinned list; all others are placed in the scoreable list. - Validate pinned budget. The total tokens of all pinned items must fit within
maxTokens - outputReserve. If they do not, the pipeline raises an error.
Pinned items flow directly to the Place stage, skipping Score, Deduplicate, Sort, and Slice entirely. Scoreable items continue to the Score stage.
Algorithm
CLASSIFY(items, budget):
pinned <- empty list
scoreable <- empty list
for i <- 0 to length(items) - 1:
item <- items[i]
if item.tokens < 0:
// Exclude: NegativeTokens
continue
if item.pinned:
APPEND(pinned, item)
else:
APPEND(scoreable, item)
// Validate pinned budget
pinnedTokens <- 0
for i <- 0 to length(pinned) - 1:
pinnedTokens <- pinnedTokens + pinned[i].tokens
availableForPinned <- budget.maxTokens - budget.outputReserve
if pinnedTokens > availableForPinned:
ERROR("Pinned items require {pinnedTokens} tokens, but only
{availableForPinned} are available")
return (pinned, scoreable)
Edge Cases
- Empty input. If
itemsis empty, bothpinnedandscoreableare empty lists. This is not an error. - All items pinned. If every item is pinned,
scoreableis empty. The Score, Deduplicate, Sort, and Slice stages operate on an empty list (producing empty output). The Place stage receives only pinned items. - All items excluded. If every item has
tokens < 0, both lists are empty. - Zero-token items. Items with
tokens = 0are valid and are classified normally (into pinned or scoreable). They are not excluded. - Pinned item with tokens < 0. The negative-token check runs before the pinned check. A pinned item with
tokens < 0is excluded, not pinned.
Conformance Notes
- The exclusion check (
tokens < 0) MUST be evaluated before the pinned partition. A pinned item with negative tokens is excluded. - The pinned budget validation MUST compare against
maxTokens - outputReserve, nottargetTokens. - Input order within each partition MUST be preserved. If items A, B, C are scoreable (in that input order), they appear as A, B, C in the scoreable list.
Stage 2: Score
The Score stage computes a relevance score for each scoreable item by invoking the configured scorer, producing a list of ScoredItem pairs.
Overview
Scoring is the primary ranking mechanism in the pipeline. For each scoreable item, the scorer receives both the individual item and the full list of all scoreable items. This dual-argument design enables both absolute scorers (that examine only the item) and relative scorers (that compare the item against its peers).
The output is a list of ScoredItem pairs, each containing the original ContextItem and its computed score. The order of the output list matches the order of the input list (input position is preserved).
Algorithm
SCORE(scoreable, scorer):
scored <- new array of length(scoreable)
for i <- 0 to length(scoreable) - 1:
score <- scorer.Score(scoreable[i], scoreable)
scored[i] <- ScoredItem(scoreable[i], score)
return scored
Scorer Interface
A scorer implements a single function:
Score(item: ContextItem, allItems: list of ContextItem) -> float64
itemis the individual item being scored.allItemsis the complete list of scoreable items (includingitemitself).- The return value is an IEEE 754 64-bit double.
See the Scorers chapter for all scorer algorithm specifications.
Score Semantics
- Scores are conventionally in the range [0.0, 1.0], where 0.0 indicates lowest relevance and 1.0 indicates highest relevance.
- The [0.0, 1.0] range is a convention, not an enforced constraint. Individual scorers (e.g., KindScorer with custom weights) may produce values outside this range.
- All scoring computations MUST use IEEE 754 64-bit double-precision floating-point arithmetic (see Introduction).
Edge Cases
- Empty scoreable list. If
scoreableis empty, the output is an empty array. The scorer is never invoked. - Single item. The scorer receives a one-element
allItemslist. Relative scorers (e.g., RecencyScorer, PriorityScorer) handle this by returning a defined value (typically 1.0 for the sole item with a non-null field). - NaN or infinite scores. This specification does not mandate rejection of NaN or infinite scores at the Score stage. However, such values may produce implementation-defined behavior in subsequent stages (Sort, Slice). Well-behaved scorers produce finite values.
Conformance Notes
- The scorer MUST receive the full
scoreablelist as the second argument, not a subset or the original input list. - Output order MUST match input order:
scored[i]corresponds toscoreable[i]. - The scorer is invoked exactly once per item per pipeline execution.
Stage 3: Deduplicate
The Deduplicate stage removes items with duplicate content, keeping only the highest-scoring instance of each unique content string. Deduplication is enabled by default but can be disabled.
Overview
Deduplication prevents the same content from appearing multiple times in the context window. This is important because retrieval-augmented generation (RAG) sources and tool outputs may produce identical content from different sources or at different times.
Identity semantics: Two items are considered duplicates if and only if their content fields are byte-exact equal. This specification mandates ordinal (byte-exact) comparison with no Unicode normalization, no case folding, and no whitespace normalization. Items that differ by a single byte — including trailing whitespace, BOM markers, or normalization form — are considered distinct.
Pitfall (P3): Deduplication Identity. Implementations MUST use byte-exact ordinal comparison for content equality. Using Unicode-normalized or case-insensitive comparison would produce different deduplication results and fail conformance tests.
Algorithm
DEDUPLICATE(scored, enabled):
if not enabled or length(scored) = 0:
return scored
// Map: content string -> index of best item in scored array
bestByContent <- empty map (ordinal string keys)
for i <- 0 to length(scored) - 1:
content <- scored[i].item.content
if content IN bestByContent:
existingIndex <- bestByContent[content]
if scored[i].score > scored[existingIndex].score:
bestByContent[content] <- i
// Tiebreak: when scores are equal, keep the lower index
// (the existing entry already has the lower index, so no action needed)
else:
bestByContent[content] <- i
// Collect survivors in original order
deduped <- empty list
for i <- 0 to length(scored) - 1:
if bestByContent[scored[i].item.content] = i:
APPEND(deduped, scored[i])
return deduped
Tiebreaking
When two duplicate items have the same score, the item with the lower original index (earlier in the input) is kept. This is achieved naturally by the algorithm: the first occurrence is stored in the map, and subsequent items with equal scores do not replace it.
Edge Cases
- Deduplication disabled. When disabled, the input is returned unchanged. No content comparison occurs.
- Empty input. Returns an empty list.
- No duplicates. All items are unique; the output is identical to the input.
- All items identical. All items have the same content. Only one survives: the one with the highest score (or lowest index if scores are equal).
- Duplicate content, different metadata. Only
contentis compared. Items with the same content but differentkind,source,tags, or other fields are considered duplicates. The surviving item retains all its original fields.
Conformance Notes
- Content comparison MUST be byte-exact ordinal. No Unicode normalization (NFC, NFD, NFKC, NFKD), no case folding, no whitespace normalization.
- The survivor for each content group is the item with the highest score. On score ties, the item with the lowest original index (position in the
scoredinput array) wins. - The output MUST preserve the relative order of surviving items from the input. If items at indices 0, 3, and 7 survive, they appear in that order in the output.
Stage 4: Sort
The Sort stage orders scored items by descending score using a stable sort with a prescribed tiebreaking rule.
Overview
After deduplication, items must be sorted so that the highest-scored items appear first. This ordering is consumed by the Slice stage, which greedily or optimally selects items from the front of the sorted list.
Sort stability is critical: when two items have the same score, their relative order from the input must be preserved. This ensures deterministic, reproducible output across all conforming implementations.
Pitfall (P1): Sort Stability. Implementations MUST use a stable sort algorithm or equivalently sort by the composite key
(score descending, originalIndex ascending). An unstable sort (e.g., quicksort without index tiebreaking) can produce different orderings for equal-scored items, causing conformance test failures.
Algorithm
SORT(deduped):
// Build composite sort keys
keys <- new array of (score: float64, index: int) of length(deduped)
for i <- 0 to length(deduped) - 1:
keys[i] <- (deduped[i].score, i)
// Sort by score descending, then by index ascending (tiebreak)
SORT-BY(keys, comparator):
comparator(a, b):
if a.score != b.score:
return COMPARE-DESCENDING(a.score, b.score)
else:
return COMPARE-ASCENDING(a.index, b.index)
// Reconstruct sorted array
sorted <- new array of length(deduped)
for i <- 0 to length(keys) - 1:
sorted[i] <- deduped[keys[i].index]
return sorted
Implementation Pattern
The algorithm builds an array of (score, index) pairs, sorts this auxiliary array using the composite comparator, then reconstructs the ScoredItem array by following the index references. This pattern works with any sort algorithm (stable or unstable) because the index component of the composite key provides a deterministic tiebreak.
Alternatively, implementations may use a natively stable sort and sort only by score descending, relying on the sort’s stability guarantee to preserve input order for equal scores. Both approaches produce identical results.
Edge Cases
- Empty input. Returns an empty array.
- Single item. Returns a single-element array (no sorting needed).
- All equal scores. Output preserves input order entirely (the index tiebreak produces ascending index order).
- NaN scores. Behavior with NaN scores is implementation-defined. IEEE 754 comparisons involving NaN return false for all relational operators, which may cause items with NaN scores to appear in an unpredictable position. Conforming implementations are not required to handle NaN scores in any particular way. Well-behaved scorers do not produce NaN.
Conformance Notes
- The sort MUST be stable, or equivalently MUST use the composite key
(score descending, originalIndex ascending). - “Original index” refers to the item’s position in the input to the Sort stage (i.e., the deduplication output), not the item’s position in the original candidate list.
- Score comparison uses standard IEEE 754 floating-point ordering (total order for non-NaN values).
Stage 5: Slice
The Slice stage selects items from the sorted list that fit within the effective token budget. It delegates to a configurable slicer strategy.
Overview
Slicing is where the token budget is enforced. The pipeline computes an effective budget that accounts for tokens already committed (pinned items and output reserve), then delegates to the configured slicer strategy to select which items to include.
The slicer receives items already sorted by score descending (from the Sort stage), so it can make informed decisions about which items to drop when the budget is tight.
Effective Budget Computation
Before invoking the slicer, the pipeline computes the adjusted budget:
COMPUTE-EFFECTIVE-BUDGET(budget, pinnedTokens):
reservedTokens <- sum(budget.reservedSlots.values)
effectiveMax <- max(0, budget.maxTokens - budget.outputReserve - pinnedTokens - reservedTokens)
effectiveTarget <- max(0, budget.targetTokens - pinnedTokens - reservedTokens)
effectiveTarget <- min(effectiveTarget, effectiveMax)
if budget.estimationSafetyMarginPercent > 0:
multiplier <- 1.0 - budget.estimationSafetyMarginPercent / 100.0
effectiveMax <- floor(effectiveMax * multiplier)
effectiveTarget <- floor(effectiveTarget * multiplier)
effectiveTarget <- min(effectiveTarget, effectiveMax)
return ContextBudget(
maxTokens: effectiveMax,
targetTokens: effectiveTarget
)
Where:
pinnedTokensis the sum oftokensfor all pinned items (computed during Classify).reservedTokensis the sum of all values inbudget.reservedSlots, reserving token capacity for per-kind guarantees.effectiveMaxis the hard ceiling for scoreable items after reserving space for output, pinned items, and reserved slots.effectiveTargetis the soft goal for the slicer, clamped toeffectiveMax.- The safety margin reduces the effective budget by the configured percentage to account for token estimation error. Both values use
floor(integer truncation) when converting from the floating-point product.
Algorithm
SLICE(sorted, budget, pinnedTokens, slicer):
adjustedBudget <- COMPUTE-EFFECTIVE-BUDGET(budget, pinnedTokens)
// Delegate to slicer strategy
slicedItems <- slicer.Slice(sorted, adjustedBudget)
return slicedItems
Slicer Interface
A slicer implements a single function:
Slice(sorted: list of ScoredItem, budget: ContextBudget) -> list of ContextItem
sortedis the score-descending sorted list of ScoredItem pairs.budgetis the effective budget (already adjusted for pinned items and output reserve).- The return value is the list of selected ContextItem instances (without scores).
See the Slicers chapter for all slicer algorithm specifications.
Edge Cases
- Empty sorted list. The slicer receives an empty list and returns an empty list.
- Zero effective budget. If
effectiveMax = 0oreffectiveTarget = 0, the slicer should select no items (or only zero-token items, depending on the slicer strategy). - All items fit. If the total tokens of all sorted items is within
effectiveTarget, the slicer may select all items. - Pinned items consume entire budget. If
pinnedTokens >= targetTokens, theneffectiveTarget = 0. The slicer selects no items (or only zero-token items), and the final output consists of pinned items only. - Reserved slots exceed remaining budget. If the sum of
reservedSlotsvalues exceedsmaxTokens - outputReserve - pinnedTokens, theneffectiveMax = 0and the slicer selects no items (or only zero-token items).
Conformance Notes
- The effective budget computation MUST match the formulas above exactly. In particular,
effectiveTargetis clamped toeffectiveMax(not the other way around). - The slicer receives a fresh ContextBudget with only
maxTokensandtargetTokensset. The effects ofoutputReserve,reservedSlots, andestimationSafetyMarginPercentare incorporated into themaxTokensandtargetTokensvalues of the adjusted budget; the original fields are not forwarded. - The slicer output is a list of ContextItem (not ScoredItem). Scores are re-associated by the pipeline after slicing, by matching items back to their scored counterparts using reference identity.
Stage 6: Place
The Place stage merges pinned items with sliced items, detects and handles token budget overflow, and delegates to a configurable placer strategy for final ordering.
Overview
Placing is the final pipeline stage. It has three responsibilities:
- Merge pinned items (from Classify) with sliced items (from Slice) into a single scored item list. Pinned items are assigned a score of 1.0.
- Detect and handle overflow — if the merged total exceeds
targetTokens, apply the configured OverflowStrategy. - Order the merged items by delegating to the configured placer strategy.
Algorithm
PLACE(pinned, slicedScored, budget, overflowStrategy, placer):
// Step 1: Merge pinned items with sliced scored items
merged <- new array of length(pinned) + length(slicedScored)
for i <- 0 to length(pinned) - 1:
merged[i] <- ScoredItem(pinned[i], 1.0)
for i <- 0 to length(slicedScored) - 1:
merged[length(pinned) + i] <- slicedScored[i]
// Step 2: Overflow detection
mergedTokens <- 0
for i <- 0 to length(merged) - 1:
mergedTokens <- mergedTokens + merged[i].item.tokens
if mergedTokens > budget.targetTokens:
merged <- HANDLE-OVERFLOW(merged, budget.targetTokens, overflowStrategy)
// Step 3: Delegate to placer for final ordering
placed <- placer.Place(merged)
return placed
Overflow Handling
HANDLE-OVERFLOW(merged, targetTokens, strategy):
if strategy = Throw:
ERROR("Selected items require {mergedTokens} tokens,
exceeding target budget of {targetTokens}")
if strategy = Truncate:
// Keep items from front (highest-scored), skip from back
// Pinned items are never removed
kept <- empty list
currentTokens <- 0
for i <- 0 to length(merged) - 1:
if merged[i].item.pinned:
APPEND(kept, merged[i])
currentTokens <- currentTokens + merged[i].item.tokens
else if currentTokens + merged[i].item.tokens <= targetTokens:
APPEND(kept, merged[i])
currentTokens <- currentTokens + merged[i].item.tokens
// else: excluded (BudgetExceeded or PinnedOverride)
return kept
if strategy = Proceed:
// Accept over-budget selection, notify observer
NOTIFY-OVERFLOW(mergedTokens - targetTokens, merged)
return merged
Placer Interface
A placer implements a single function:
Place(merged: list of ScoredItem) -> list of ContextItem
mergedis the list of ScoredItem pairs (pinned items with score 1.0, sliced items with their computed scores).- The return value is the final ordered list of ContextItem instances.
See the Placers chapter for all placer algorithm specifications.
Pinned Item Score
Pinned items are assigned a score of 1.0 when merged into the scored item list. This score value represents the highest conventional relevance — pinned items are always-included by definition. The 1.0 score affects placer behavior (e.g., UShapedPlacer places highest-scored items at the edges).
Edge Cases
- No pinned items. The merged list contains only sliced items. Overflow detection and placement proceed normally.
- No sliced items. The merged list contains only pinned items (all with score 1.0). This occurs when the scoreable list was empty or the entire effective budget was zero.
- Truncation cannot reach target. If pinned items alone exceed
targetTokens, truncation removes all non-pinned items but the total still exceeds the target. This is a best-effort situation — pinned items are never removed. - Empty merged list. Both pinned and sliced lists are empty. The placer receives an empty list and returns an empty list.
Conformance Notes
- Pinned items MUST appear before sliced items in the
mergedarray (indices[0, length(pinned))for pinned,[length(pinned), end)for sliced). - Overflow detection compares against
budget.targetTokens(the original target, not the effective target used for slicing). - The
Truncatestrategy iterates the merged array in order (pinned first, then sliced items in score-descending order). Pinned items are always kept; non-pinned items are kept while they fit withintargetTokens. - The placer receives the post-overflow merged list (which may be shorter than the pre-overflow list if
Truncatewas applied).
Scorers
Scorers compute relevance scores for context items. They are the primary ranking mechanism in the Cupel pipeline, invoked during Stage 2: Score.
Scorer Interface
A scorer implements a single pure function:
Score(item: ContextItem, allItems: list of ContextItem) -> float64
itemis the individual ContextItem being scored.allItemsis the complete list of scoreable items (includingitemitself). This enables relative scoring — scorers that rank an item against its peers.- The return value is an IEEE 754 64-bit double-precision floating-point number.
Contract
- Pure function. Scorers MUST NOT mutate state between calls, perform I/O, or produce side effects. Given identical inputs, a scorer MUST return the same output.
- Conventional range. Scores are conventionally in the range [0.0, 1.0], where 0.0 indicates lowest relevance and 1.0 indicates highest relevance. This range is a convention, not an enforced constraint — custom KindScorer weights may produce values outside this range.
- IEEE 754 arithmetic. All scoring computations MUST use 64-bit double-precision floating-point arithmetic (see Numeric Precision).
Scorer Summary
| Scorer | Input Fields Used | Output Range | Description |
|---|---|---|---|
| RecencyScorer | timestamp | [0.0, 1.0] | Rank-based: more recent items score higher |
| PriorityScorer | priority | [0.0, 1.0] | Rank-based: higher priority values score higher |
| KindScorer | kind | [0.0, max weight] | Dictionary lookup by item kind |
| TagScorer | tags | [0.0, 1.0] | Weighted tag matching, normalized |
| FrequencyScorer | tags | [0.0, 1.0] | Proportion of peers sharing a tag |
| ReflexiveScorer | futureRelevanceHint | [0.0, 1.0] | Passthrough of caller-provided hint |
| CompositeScorer | (delegates to children) | [0.0, 1.0] | Weighted average of child scorers |
| ScaledScorer | (delegates to inner) | [0.0, 1.0] | Min-max normalization of an inner scorer |
| MetadataTrustScorer | metadata["cupel:trust"] | [0.0, 1.0] | Passthrough of caller-provided trust value from metadata |
| DecayScorer | timestamp | [0.0, 1.0] | Time-based decay with configurable curve (Exponential, Step, Window) |
| MetadataKeyScorer | metadata[key] | {1.0, boost} | Multiplicative boost for items where metadata key matches a configured value |
Scorer Categories
Absolute Scorers
Absolute scorers examine only the item being scored; the allItems parameter is ignored. These scorers produce the same score for an item regardless of what other items are present.
- KindScorer — dictionary lookup
- TagScorer — weighted tag matching
- ReflexiveScorer — passthrough of
futureRelevanceHint - MetadataTrustScorer — passthrough of caller-provided trust value from
metadata["cupel:trust"] - DecayScorer — time-based decay from item timestamp against a reference time (Exponential, Step, or Window curve)
- MetadataKeyScorer — multiplicative boost for items where a metadata key matches a configured value
Relative Scorers
Relative scorers compare the item against its peers. The same item may receive different scores depending on the composition of allItems.
- RecencyScorer — rank among timestamped items
- PriorityScorer — rank among prioritized items
- FrequencyScorer — proportion of peers sharing a tag
Composite Scorers
Composite scorers delegate to other scorers and transform or combine their outputs.
- CompositeScorer — weighted average of multiple child scorers
- ScaledScorer — min-max normalization wrapper for any inner scorer
RecencyScorer
The RecencyScorer assigns higher scores to more recent items based on their relative timestamp rank among all scoreable items.
Overview
RecencyScorer is a relative scorer — it compares each item’s timestamp against all other items’ timestamps. The score represents the item’s position in the temporal ordering: the most recent item scores 1.0, the oldest scores 0.0, and items in between are linearly interpolated.
Items without a timestamp receive a score of 0.0.
Fields Used
| Field | Source | Purpose |
|---|---|---|
timestamp | ContextItem | Temporal ordering key |
Algorithm
For each item, count how many other timestamped items have a strictly earlier timestamp. This count (the item’s rank) is divided by the number of timestamped items minus one to produce a normalized score.
RECENCY-SCORE(item, allItems):
// Items without a timestamp always score 0.0
if item.timestamp = null:
return 0.0
countWithTimestamp <- 0
rank <- 0
for i <- 0 to length(allItems) - 1:
if allItems[i].timestamp = null:
continue
countWithTimestamp <- countWithTimestamp + 1
if allItems[i].timestamp < item.timestamp:
rank <- rank + 1
// Single timestamped item is the "most recent" by definition
if countWithTimestamp <= 1:
return 1.0
return rank / (countWithTimestamp - 1)
Score Distribution
Given N items with timestamps, all with distinct timestamps:
- The most recent item has rank N - 1, scoring 1.0.
- The oldest item has rank 0, scoring 0.0.
- Intermediate items score
rank / (N - 1), producing evenly-spaced values.
Tied Timestamps
When multiple items share the same timestamp, they receive the same rank (the count of items strictly older than them). This means tied items receive identical scores.
Edge Cases
| Condition | Result |
|---|---|
item.timestamp is null | 0.0 |
| All items have null timestamps | All score 0.0 |
| Only one item has a timestamp | That item scores 1.0 |
| All items share the same timestamp | All score 0.0 (rank = 0 for each, since no item is strictly older) |
Note: The single-item case returns 1.0 (not 0.0) because a lone timestamped item is trivially the most recent.
Complexity
- Time: O(N) per item, O(N^2) total across all items in a scoring pass.
- Space: O(1) auxiliary per invocation.
Conformance Notes
- Timestamp comparison is by temporal ordering (the underlying instant), not by string representation.
- The comparison
allItems[i].timestamp < item.timestampis strict — items with equal timestamps do not increment the rank. - The denominator is
countWithTimestamp - 1, notlength(allItems) - 1. Only items that have a timestamp participate in the ranking.
PriorityScorer
The PriorityScorer assigns higher scores to items with higher numeric priority values, based on their relative rank among all scoreable items.
Overview
PriorityScorer is a relative scorer with the same rank-based algorithm as RecencyScorer, but applied to the priority field instead of timestamp. Higher numeric priority values produce higher scores.
Items without a priority receive a score of 0.0.
Fields Used
| Field | Source | Purpose |
|---|---|---|
priority | ContextItem | Numeric importance ranking key |
Algorithm
For each item, count how many other prioritized items have a strictly lower priority value. This count (the item’s rank) is divided by the number of prioritized items minus one to produce a normalized score.
PRIORITY-SCORE(item, allItems):
// Items without a priority always score 0.0
if item.priority = null:
return 0.0
countWithPriority <- 0
rank <- 0
for i <- 0 to length(allItems) - 1:
if allItems[i].priority = null:
continue
countWithPriority <- countWithPriority + 1
if allItems[i].priority < item.priority:
rank <- rank + 1
// Single prioritized item scores 1.0
if countWithPriority <= 1:
return 1.0
return rank / (countWithPriority - 1)
Score Distribution
Given N items with priorities, all with distinct priority values:
- The highest-priority item has rank N - 1, scoring 1.0.
- The lowest-priority item has rank 0, scoring 0.0.
- Intermediate items score
rank / (N - 1), producing evenly-spaced values.
Tied Priorities
When multiple items share the same priority value, they receive the same rank (the count of items with strictly lower priority). Tied items receive identical scores.
Edge Cases
| Condition | Result |
|---|---|
item.priority is null | 0.0 |
| All items have null priorities | All score 0.0 |
| Only one item has a priority | That item scores 1.0 |
| All items share the same priority | All score 0.0 (no item has strictly lower priority) |
Complexity
- Time: O(N) per item, O(N^2) total across all items in a scoring pass.
- Space: O(1) auxiliary per invocation.
Conformance Notes
- Higher numeric priority values produce higher scores.
priority = 10scores higher thanpriority = 5. - The comparison
allItems[i].priority < item.priorityis strict — items with equal priority do not increment the rank. - The denominator is
countWithPriority - 1, notlength(allItems) - 1. Only items that have a priority participate in the ranking. - The rank-based algorithm is identical to RecencyScorer, substituting
priorityfortimestamp.
KindScorer
The KindScorer assigns a score to each item based on its ContextKind, using a configurable weight map.
Overview
KindScorer is an absolute scorer — it examines only the item’s kind field. The allItems parameter is ignored. The score is a direct lookup in a weight dictionary: if the item’s kind has a configured weight, that weight is returned; otherwise, the score is 0.0.
Fields Used
| Field | Source | Purpose |
|---|---|---|
kind | ContextItem | Dictionary lookup key |
Default Weights
When no custom weights are provided, the following defaults are used:
| ContextKind | Weight |
|---|---|
"SystemPrompt" | 1.0 |
"Memory" | 0.8 |
"ToolOutput" | 0.6 |
"Document" | 0.4 |
"Message" | 0.2 |
These defaults reflect a common heuristic: system prompts and memories are typically more important to preserve than messages.
Algorithm
KIND-SCORE(item, allItems, weights):
// weights is a map of ContextKind -> float64
if weights contains item.kind: // ContextKind equality is case-insensitive (see enumerations.md)
return weights[item.kind]
else:
return 0.0
Case-insensitivity is a property of ContextKind equality as defined in Enumerations, not an ad-hoc behavior of the weight dictionary.
Construction Validation
Custom weight maps MUST be validated at construction time:
- Each weight value MUST be non-negative (
>= 0.0). - Each weight value MUST be finite (not NaN, not positive/negative infinity).
- Weight values are not required to be in [0.0, 1.0] — custom weights may exceed 1.0.
VALIDATE-KIND-WEIGHTS(weights):
for each (kind, weight) in weights:
if weight < 0.0:
ERROR("Weight for kind must be non-negative")
if weight is not finite:
ERROR("Weight for kind must be finite")
Edge Cases
| Condition | Result |
|---|---|
| Item’s kind is in the weight map | The corresponding weight value |
| Item’s kind is not in the weight map | 0.0 |
| Custom weight map is empty | All items score 0.0 |
| Custom weight exceeds 1.0 | The weight value is returned as-is (no clamping) |
Complexity
- Time: O(1) per item (hash map lookup).
- Space: O(K) where K is the number of configured kinds.
Conformance Notes
- ContextKind comparison MUST be case-insensitive using ASCII case folding (see ContextKind). A weight configured for
"Message"matches items with kind"message","MESSAGE", etc. - The default weight map MUST contain exactly the five well-known ContextKind values listed above, with the specified weights. Implementations MAY use a different internal representation (e.g., frozen dictionary, immutable map) but MUST produce the same lookup behavior.
- Custom weights are not clamped to [0.0, 1.0]. An implementation using KindScorer with weights > 1.0 should be aware that composite scoring may produce scores outside the conventional range.
TagScorer
The TagScorer assigns a score based on how well an item’s tags match a configured set of weighted tags.
Overview
TagScorer is an absolute scorer — it examines only the item’s tags field against a preconfigured tag-weight map. The allItems parameter is ignored. The score is the sum of matched tag weights divided by the total configured weight, clamped to [0.0, 1.0].
Fields Used
| Field | Source | Purpose |
|---|---|---|
tags | ContextItem | Tags to match against configured weights |
Algorithm
TAG-SCORE(item, allItems, tagWeights, totalWeight):
// tagWeights: map of string -> float64
// totalWeight: sum of all values in tagWeights (precomputed at construction)
if totalWeight = 0.0 or length(item.tags) = 0:
return 0.0
matchedSum <- 0.0
for i <- 0 to length(item.tags) - 1:
if tagWeights contains item.tags[i]: // case-sensitive lookup (default)
matchedSum <- matchedSum + tagWeights[item.tags[i]]
return min(matchedSum / totalWeight, 1.0)
Construction
TagScorer is constructed with a tag-weight map. At construction time:
- The
totalWeightis precomputed as the sum of all weight values. - Each weight MUST be non-negative (
>= 0.0). - Each weight MUST be finite (not NaN, not positive/negative infinity).
CONSTRUCT-TAG-SCORER(tagWeights):
totalWeight <- 0.0
for each (tag, weight) in tagWeights:
if weight < 0.0:
ERROR("Tag weight must be non-negative")
if weight is not finite:
ERROR("Tag weight must be finite")
totalWeight <- totalWeight + weight
// Store tagWeights and totalWeight for use in scoring
Score Interpretation
- A score of 1.0 means the item’s matched tags account for 100% or more of the total configured weight.
- A score of 0.0 means no tags matched (or the item has no tags, or no weights are configured).
- The
min(..., 1.0)clamp prevents scores from exceeding 1.0 when an item matches all configured tags or when duplicate tag entries would cause the sum to exceed the total.
Edge Cases
| Condition | Result |
|---|---|
| Item has no tags | 0.0 |
totalWeight is 0.0 (all weights are zero) | 0.0 |
| Item has tags but none match configured weights | 0.0 |
| Item matches all configured tags | min(sum of matched / total, 1.0) = 1.0 if exactly all matched |
| Item has duplicate tags matching the same weight | Each occurrence adds to matchedSum; result clamped to 1.0 |
Complexity
- Time: O(T) per item, where T is the number of tags on the item (hash map lookup per tag).
- Space: O(W) where W is the number of configured tag weights.
Conformance Notes
- Tag key lookup in the weight map uses the string comparer provided at construction, defaulting to ordinal (case-sensitive) comparison. This is distinct from the case-insensitive tag comparison used by FrequencyScorer.
- The
totalWeightdenominator is the sum of all configured weights, not just the weights of matched tags. - The result is clamped to
min(result, 1.0). It is never clamped to a minimum of 0.0 becausematchedSumcan only be non-negative (all weights are non-negative).
FrequencyScorer
The FrequencyScorer scores an item based on the proportion of peer items that share at least one tag with it.
Overview
FrequencyScorer is a relative scorer — it compares the item’s tags against every other item’s tags to determine how “connected” the item is within the candidate set. Items that share tags with many peers score higher, reflecting thematic relevance within the context.
Fields Used
| Field | Source | Purpose |
|---|---|---|
tags | ContextItem | Tag overlap detection with peers |
Algorithm
FREQUENCY-SCORE(item, allItems):
if length(item.tags) = 0 or length(allItems) <= 1:
return 0.0
matchingItems <- 0
for i <- 0 to length(allItems) - 1:
// Skip self (by identity, not by value)
if allItems[i] is item:
continue
if length(allItems[i].tags) = 0:
continue
if SHARES-ANY-TAG(item.tags, allItems[i].tags):
matchingItems <- matchingItems + 1
return matchingItems / (length(allItems) - 1)
Tag Overlap Detection
SHARES-ANY-TAG(tagsA, tagsB):
for i <- 0 to length(tagsA) - 1:
for j <- 0 to length(tagsB) - 1:
if CASE-INSENSITIVE-EQUAL(tagsA[i], tagsB[j]):
return true
return false
Score Interpretation
- A score of 1.0 means every other item in the list shares at least one tag with this item.
- A score of 0.0 means no other item shares any tag with this item (or the item has no tags, or it is the only item).
- Intermediate values represent the fraction of peers with overlapping tags.
Edge Cases
| Condition | Result |
|---|---|
| Item has no tags | 0.0 |
Only one item in allItems | 0.0 |
| No peer shares any tag | 0.0 |
| All peers share at least one tag | 1.0 |
| Peer has no tags | That peer does not count as matching |
Self-Exclusion
The item being scored is excluded from the peer count using reference identity (object identity), not value equality. This means:
- If the same ContextItem instance appears multiple times in
allItems, only the reference-identical instance is skipped; other copies with identical content are counted as peers. - The denominator is always
length(allItems) - 1, regardless of how many items are skipped.
Complexity
- Time: O(N * T_a * T_b) per item in the worst case, where N is the number of items and T_a, T_b are tag list lengths. O(N^2 * T^2) total across all items.
- Space: O(1) auxiliary per invocation.
Conformance Notes
- Tag comparison in
SHARES-ANY-TAGMUST be case-insensitive using ASCII case folding."Important"and"important"are considered matching tags. - Self-exclusion MUST use reference identity (the
ischeck), not structural equality. This matches the ContextItem immutability contract — items are compared by identity throughout the pipeline. - The denominator is
length(allItems) - 1, which accounts for the self-exclusion. It is not reduced further for tagless peers.
ReflexiveScorer
The ReflexiveScorer passes through the item’s futureRelevanceHint as the score, clamped to [0.0, 1.0].
Overview
ReflexiveScorer is an absolute scorer — it examines only the item’s futureRelevanceHint field. The allItems parameter is ignored. This scorer enables callers (or upstream systems like an LLM) to inject relevance signals directly into the scoring pipeline.
Fields Used
| Field | Source | Purpose |
|---|---|---|
futureRelevanceHint | ContextItem | Caller-provided relevance signal |
Algorithm
REFLEXIVE-SCORE(item, allItems):
if item.futureRelevanceHint = null:
return 0.0
value <- item.futureRelevanceHint
if value is not finite: // NaN, +infinity, -infinity
return 0.0
return clamp(value, 0.0, 1.0)
Where clamp(x, lo, hi) returns lo if x < lo, hi if x > hi, and x otherwise.
Score Interpretation
- The score is a direct passthrough of the caller-provided hint, clamped to the conventional [0.0, 1.0] range.
- A hint of 0.0 means the caller considers the item irrelevant for future context.
- A hint of 1.0 means the caller considers the item highly relevant for future context.
- Values outside [0.0, 1.0] are clamped, not rejected.
Edge Cases
| Condition | Result |
|---|---|
futureRelevanceHint is null | 0.0 |
futureRelevanceHint is NaN | 0.0 |
futureRelevanceHint is +infinity | 0.0 |
futureRelevanceHint is -infinity | 0.0 |
futureRelevanceHint is 0.5 | 0.5 |
futureRelevanceHint is -0.3 | 0.0 (clamped) |
futureRelevanceHint is 1.7 | 1.0 (clamped) |
Complexity
- Time: O(1) per item.
- Space: O(1) auxiliary.
Conformance Notes
- Non-finite values (NaN, positive infinity, negative infinity) MUST return 0.0, not be clamped to a range boundary.
- The check for finiteness MUST occur before the clamp. A NaN value must not pass through
clamp(which may have implementation-defined behavior for NaN in some languages). - The clamping order is: null check, then finiteness check, then clamp to [0.0, 1.0].
CompositeScorer
The CompositeScorer combines multiple child scorers into a single score using a weighted average.
Overview
CompositeScorer is a composite scorer — it delegates to a list of child scorers, each with an associated weight, and returns the weighted average of their outputs. This enables building rich scoring strategies from simple building blocks.
CompositeScorer is the primary mechanism for combining multiple scoring signals (e.g., “60% recency + 30% priority + 10% kind”) into a single relevance score.
Algorithm
Scoring
COMPOSITE-SCORE(item, allItems, scorers, normalizedWeights):
// scorers: array of Scorer
// normalizedWeights: array of float64, sum = 1.0
result <- 0.0
for i <- 0 to length(scorers) - 1:
result <- result + scorers[i].Score(item, allItems) * normalizedWeights[i]
return result
Construction
At construction time, the raw weights are normalized so they sum to 1.0:
CONSTRUCT-COMPOSITE(entries):
// entries: list of (scorer, weight) pairs
if length(entries) = 0:
ERROR("At least one scorer entry is required")
totalWeight <- 0.0
for i <- 0 to length(entries) - 1:
(scorer, weight) <- entries[i]
if scorer = null:
ERROR("Scorer must not be null")
if weight <= 0.0:
ERROR("Weight must be positive")
if weight is not finite:
ERROR("Weight must be finite")
totalWeight <- totalWeight + weight
normalizedWeights <- new array of length(entries)
for i <- 0 to length(entries) - 1:
normalizedWeights[i] <- entries[i].weight / totalWeight
DETECT-CYCLES(entries)
self.scorers <- [entries[i].scorer for i in 0..length(entries)]
self.normalizedWeights <- normalizedWeights
Weight Normalization
Weights are relative, not absolute. The following configurations produce identical results:
[(RecencyScorer, 3.0), (PriorityScorer, 1.0)][(RecencyScorer, 0.75), (PriorityScorer, 0.25)]
Both normalize to weights [0.75, 0.25].
Cycle Detection
CompositeScorer supports nesting — a CompositeScorer may contain other CompositeScorer or ScaledScorer instances as children. However, circular references are invalid and MUST be detected at construction time.
DETECT-CYCLES(entries):
visited <- empty set (by reference identity)
inPath <- empty set (by reference identity)
for i <- 0 to length(entries) - 1:
DETECT-CYCLES-DFS(entries[i].scorer, visited, inPath)
DETECT-CYCLES-DFS(node, visited, inPath):
if node is in visited:
if node is in inPath:
ERROR("Cycle detected: scorer appears in its own dependency graph")
return
add node to visited
add node to inPath
if node is a CompositeScorer:
for each child in node.children:
DETECT-CYCLES-DFS(child, visited, inPath)
else if node is a ScaledScorer:
DETECT-CYCLES-DFS(node.inner, visited, inPath)
remove node from inPath
The cycle detection traverses the scorer graph using depth-first search with reference identity comparison. A cycle exists when a scorer appears in its own ancestor path.
Edge Cases
| Condition | Result |
|---|---|
| Empty entries list | Construction error |
| Single entry | Normalized weight is 1.0; effectively delegates to the child |
| All children return 0.0 | Composite returns 0.0 |
| All children return 1.0 | Composite returns 1.0 (weights sum to 1.0) |
| Nested CompositeScorer | Valid if acyclic; each child is scored recursively |
| Self-referential cycle | Construction error |
Complexity
- Construction: O(V + E) for cycle detection, where V is the number of distinct scorer nodes and E is the number of parent-child edges.
- Scoring: O(C * cost_per_child) per item, where C is the number of child scorers. For a flat composite with simple children, this is O(C) per item.
- Space: O(C) for normalized weight storage, O(V) for cycle detection.
Conformance Notes
- Weights MUST be positive (strictly greater than 0.0) and finite. Zero weights are rejected at construction time.
- Weight normalization MUST divide each weight by the sum of all weights. The normalized weights MUST sum to approximately 1.0 (subject to floating-point precision).
- Cycle detection MUST use reference identity, not structural equality. Two distinct CompositeScorer instances with identical configurations are not considered cyclic.
- Child scorers are invoked in order (index 0, 1, 2, …). While the weighted sum is mathematically commutative, the invocation order is fixed for determinism.
ScaledScorer
The ScaledScorer applies min-max normalization to an inner scorer’s output across all items, ensuring the resulting scores span the full [0.0, 1.0] range.
Overview
ScaledScorer is a composite scorer — it wraps a single inner scorer and normalizes its output using the formula (raw - min) / (max - min), where min and max are the inner scorer’s minimum and maximum outputs across all items in allItems.
This is useful when an inner scorer produces scores clustered in a narrow range (e.g., [0.3, 0.5]). ScaledScorer spreads them to [0.0, 1.0], maximizing the discriminative power of the scorer.
Fields Used
ScaledScorer uses whatever fields its inner scorer uses.
Algorithm
SCALED-SCORE(item, allItems, inner):
if length(allItems) = 0:
return 0.5
rawScore <- NaN // sentinel: not yet computed
min <- +infinity
max <- -infinity
// Single pass: score all items, find min/max, capture this item's raw score
for i <- 0 to length(allItems) - 1:
s <- inner.Score(allItems[i], allItems)
if s < min:
min <- s
if s > max:
max <- s
if allItems[i] is item: // reference identity
rawScore <- s
// Degenerate case: all scores equal
if max = min:
return 0.5
return (rawScore - min) / (max - min)
Score Distribution
After scaling:
- The item(s) with the inner scorer’s minimum score receive 0.0.
- The item(s) with the inner scorer’s maximum score receive 1.0.
- All other items are linearly interpolated between 0.0 and 1.0.
Degenerate Case
When all items receive the same score from the inner scorer (max = min), the normalization formula would divide by zero. In this case, ScaledScorer returns 0.5 — the midpoint of the [0.0, 1.0] range. This represents “no information” — all items are equally relevant according to the inner scorer.
Similarly, when allItems is empty, the result is 0.5.
Edge Cases
| Condition | Result |
|---|---|
Empty allItems | 0.5 |
Single item in allItems | 0.5 (max = min) |
| All items receive the same inner score | 0.5 (max = min) |
| Two items with distinct inner scores | One gets 0.0, the other gets 1.0 |
| Inner scorer returns values outside [0.0, 1.0] | Scaling still produces [0.0, 1.0] output |
Complexity
- Time: O(N) per item (scores all N items to find min/max). O(N^2) total across all items in a scoring pass, since each item triggers a full scan.
- Space: O(1) auxiliary per invocation.
Performance note: Each call to SCALED-SCORE invokes the inner scorer N times. Across a full scoring pass of N items, the inner scorer is invoked N^2 times total. Implementations MAY cache the min/max computation across invocations within a single scoring pass, but caching is not required by this specification. When ScaledScorer instances are deeply nested — a ScaledScorer wrapping another ScaledScorer — the cost compounds to O(N^(depth+1)) per scoring pass. Callers with multiple scale-normalization requirements should prefer a flat CompositeScorer structure over nested ScaledScorer instances.
Conformance Notes
- The item is identified within
allItemsby reference identity (theischeck), not by structural equality. - The inner scorer is invoked for every item in
allItemson every call. There is no specification requirement to cache or memoize these results (but implementations are free to do so as an optimization, provided the output is identical). - The degenerate return value MUST be exactly 0.5, not 0.0 or 1.0.
- ScaledScorer participates in CompositeScorer cycle detection — a CompositeScorer containing a ScaledScorer will traverse through the ScaledScorer to its inner scorer.
MetadataTrustScorer
The MetadataTrustScorer passes through a caller-provided trust value stored in metadata["cupel:trust"] as the score, clamped to [0.0, 1.0].
Overview
MetadataTrustScorer is an absolute scorer — it examines only the item’s metadata map. The allItems parameter is ignored. This scorer enables callers to inject a trust signal into the scoring pipeline by setting a well-known metadata key at item construction time.
Fields Used
| Field | Source | Purpose |
|---|---|---|
metadata | ContextItem | Key-value map; the scorer reads the "cupel:trust" key |
Metadata Namespace Reservation
All ContextItem.metadata keys with the "cupel:" prefix are reserved for library-defined conventions. Callers MUST NOT use this prefix for application-specific keys. This reservation is enforced at the spec level only — there is no runtime validation of key prefixes.
Conventions
cupel:trust
cupel:trust is a caller-computed trust value for the item, represented as a float64 in [0.0, 1.0].
- Storage format: In Rust,
metadataisHashMap<String, String>— the trust value MUST be stored as a decimal string (e.g.,"0.85"). The scorer parses this string to a float64 at scoring time. - In .NET,
metadataisIReadOnlyDictionary<string, object?>, which supports heterogeneous value types. Callers MAY store the trust value as adoubledirectly. Implementations MUST handle bothstringanddouble(or boxed numeric) values in the dictionary lookup. The canonical wire format for serialization interop is the decimal string representation. - The value is caller-computed. The pipeline does not set or modify this key.
- The scorer reads
cupel:trustas a scoring input only. It does not use the value as a filter or gate — items are never excluded based on this score.
cupel:source-type
cupel:source-type is an open string convention for labeling the origin of a context item. The following values are RECOMMENDED:
| Value | Meaning |
|---|---|
"user" | An end-user message |
"tool" | A tool or function call result |
"external" | Retrieved or injected context (e.g., RAG results) |
"system" | A system prompt fragment |
Callers MAY use other values. Values are not validated at construction time. This convention is a labeling aid for caller logic and is not used by any scorer in the library.
Algorithm
METADATA-TRUST-SCORE(item, allItems, config):
if item.metadata does not contain "cupel:trust":
return config.defaultScore
raw <- item.metadata["cupel:trust"] // string in Rust; string or double in .NET
value <- parse_float64(raw) // or cast if already float64
if parse failed:
return config.defaultScore
if value is not finite: // NaN, +infinity, -infinity
return config.defaultScore
return clamp(value, 0.0, 1.0)
Where clamp(x, lo, hi) returns lo if x < lo, hi if x > hi, and x otherwise. config.defaultScore is a float64 in [0.0, 1.0] set at construction time.
Score Interpretation
- The score is a direct passthrough of the caller-provided trust value, clamped to the conventional [0.0, 1.0] range.
- A score of 0.0 reflects that the caller considers the item untrustworthy or low-trust.
- A score of 1.0 reflects that the caller considers the item fully trusted.
- Values outside [0.0, 1.0] are clamped, not rejected.
- When the key is absent or the value is unparseable,
config.defaultScoreis returned. This default is set at scorer construction time and MUST be in [0.0, 1.0].
Edge Cases
| Condition | Result |
|---|---|
cupel:trust key absent | config.defaultScore |
cupel:trust value is unparseable (e.g., "high", "") | config.defaultScore |
cupel:trust value is "NaN" | config.defaultScore |
cupel:trust value is "+Infinity" or "-Infinity" | config.defaultScore |
cupel:trust value is "0.0" | 0.0 |
cupel:trust value is "0.75" | 0.75 |
cupel:trust value is "1.0" | 1.0 |
cupel:trust value is "-0.1" (below range) | 0.0 (clamped) |
cupel:trust value is "1.5" (above range) | 1.0 (clamped) |
Conformance Vector Outlines
The following scenarios outline the expected behavior for conformance testing. Full TOML conformance vectors require a metadata field extension to the test vector format (out of scope for this chapter); these outlines are narrative descriptions of the required behavior.
-
Present and valid: An item with
cupel:trust = "0.85"— the scorer returns0.85. -
Key absent: An item whose metadata map does not contain the
cupel:trustkey — the scorer returnsconfig.defaultScore(e.g.,0.5if that is the configured default). -
Unparseable value: An item with
cupel:trust = "high"(not a valid decimal float) — the scorer returnsconfig.defaultScore, not an error. -
Out-of-range value (clamped high): An item with
cupel:trust = "1.5"— the scorer returns1.0(clamped to the upper bound). -
Non-finite value: An item with
cupel:trust = "NaN"orcupel:trust = "Infinity"— the scorer returnsconfig.defaultScore, not a clamped boundary value.
Complexity
- Time: O(1) per item (hash map lookup and float parse).
- Space: O(1) auxiliary.
Conformance Notes
- The clamping order is: key-missing →
config.defaultScore; parse-failure →config.defaultScore; non-finite →config.defaultScore; then clamp to [0.0, 1.0]. - Non-finite values (NaN, positive infinity, negative infinity) MUST return
config.defaultScore, not be clamped to a range boundary. - The finiteness check MUST occur before the clamp. Non-finite values MUST NOT pass through
clamp(which may have implementation-defined behavior for NaN in some languages). - In Rust, the metadata value at key
"cupel:trust"MUST be parsed as a decimal string using standard float64 parsing (e.g.,str::parse::<f64>()). A value that fails to parse MUST be treated as missing. - In .NET, implementations MUST handle both
stringanddouble(or other boxed numeric type) when reading the value fromIReadOnlyDictionary<string, object?>. If the stored value is already adouble, it MUST be used directly without string conversion. If it is astring, it MUST be parsed. If it is any other type, it MUST be treated as a parse failure and returnconfig.defaultScore. config.defaultScoreMUST be in [0.0, 1.0]. Implementations SHOULD reject construction of a MetadataTrustScorer with adefaultScoreoutside this range.
DecayScorer
The DecayScorer assigns a score based on how much time has elapsed since an item’s timestamp, measured against a caller-supplied reference time. Older items score lower; fresher items score higher.
Overview
DecayScorer is an absolute scorer — it computes a score for each item independently using only that item’s timestamp and the injected reference time. The allItems parameter is ignored.
Contrast with RecencyScorer: RecencyScorer is a relative scorer — it ranks items against each other (most recent scores 1.0, oldest scores 0.0, with linear interpolation). DecayScorer is an absolute-decay scorer — each item’s score is derived solely from the elapsed time between its timestamp and a caller-supplied reference time, with no dependence on the other items in the set.
Fields Used
| Field | Source | Purpose |
|---|---|---|
timestamp | ContextItem | Temporal reference point for computing item age |
TimeProvider
DecayScorer requires the caller to supply a time provider at construction. There is no default — the caller must inject one explicitly (D042).
This makes the time dependency visible, prevents stale time values in long-lived pipelines, and enables deterministic testing without special test modes.
.NET
Use System.TimeProvider (BCL since .NET 8; available in net10.0 without NuGet). Pass the system clock as TimeProvider.System. For testing, supply a FakeTimeProvider or any subclass of TimeProvider.
new DecayScorer(TimeProvider.System, curve: Curve.Exponential(halfLife: TimeSpan.FromHours(24)));
Rust
#![allow(unused)] fn main() { pub trait TimeProvider: Send + Sync { fn now(&self) -> DateTime<Utc>; } }
A zero-sized unit struct SystemTimeProvider implements TimeProvider via Utc::now(). The scorer stores the provider as Box<dyn TimeProvider + Send + Sync>, which is sufficient because DecayScorer is not Clone (D047).
#![allow(unused)] fn main() { new DecayScorer(Box::new(SystemTimeProvider), curve: Curve::exponential(half_life)); }
Algorithm
DECAY-SCORE(item, allItems, config):
// allItems is ignored — DecayScorer is an absolute scorer
if item.timestamp = null:
return config.nullTimestampScore
age <- max(duration_zero, config.timeProvider.now() - item.timestamp)
// Negative age (future-dated item: timestamp > now) clamps to duration_zero.
// The item is treated as maximally fresh — age zero feeds directly into the curve.
return APPLY-CURVE(age, config.curve)
Where duration_zero is the zero-duration constant (i.e., Duration::ZERO in Rust / TimeSpan.Zero in .NET).
Curve Factories
A curve is a function from a non-negative duration (the item’s age) to a score in [0.0, 1.0]. DecayScorer ships three built-in curve factories.
Exponential(halfLife)
Returns 2^(−age / halfLife). At age zero the score is 1.0; at age equal to halfLife the score is 0.5; as age approaches infinity the score approaches 0.0 (but is never negative).
Precondition: halfLife > Duration::ZERO / halfLife > TimeSpan.Zero. Throw ArgumentException (.NET) / return Err(...) (Rust) at construction if the precondition is violated; the error message MUST name the halfLife parameter.
EXPONENTIAL-CURVE(age, halfLife):
return pow(2.0, −duration_to_seconds(age) / duration_to_seconds(halfLife))
Where duration_to_seconds converts a duration to a non-negative float64 count of seconds.
Step(windows)
windows is an ordered list of (maxAge: Duration, score: double) pairs, sorted from youngest (smallest maxAge) to oldest (largest maxAge). The scorer walks the list and returns the score of the first entry where window.maxAge > age (strict greater-than). If age is greater than or equal to every boundary, the last entry’s score applies.
STEP-CURVE(age, windows):
for each window in windows (youngest to oldest):
if window.maxAge > age:
return window.score
// age exceeded all boundaries — fall through to last entry
return windows[last].score
Preconditions:
windowsMUST contain at least one entry. Throw at construction if the list is empty.- No entry may have
maxAge = duration_zero(zero-width windows are forbidden). Throw at construction if anywindow.maxAgeis zero.
The strict > comparison means that an item whose age equals a boundary falls into the next window (older side). For example, with windows = [(1h, 0.9), (24h, 0.5), (72h, 0.1)]:
| age | first window where maxAge > age | returned score |
|---|---|---|
| 0h | 1h window | 0.9 |
| 1h | 24h window (1h is not > 1h) | 0.5 |
| 24h | 72h window | 0.1 |
| 72h | none — fall through | 0.1 (last entry) |
Window(maxAge)
Binary score. Returns 1.0 when age < maxAge (half-open interval [0, maxAge)); returns 0.0 when age >= maxAge.
WINDOW-CURVE(age, maxAge):
if age < maxAge:
return 1.0
return 0.0
Precondition: maxAge > Duration::ZERO / maxAge > TimeSpan.Zero. Throw at construction if violated.
The boundary is exclusive on the right: an item whose age is exactly maxAge returns 0.0.
Configuration
| Parameter | Type | Required | Default | Notes |
|---|---|---|---|---|
timeProvider | System.TimeProvider (.NET) / Box<dyn TimeProvider> (Rust) | Yes | — | Caller must inject; no default (D042) |
curve | Curve (one of Exponential, Step, Window) | Yes | — | Decay shape |
nullTimestampScore | float64, range [0.0, 1.0] | No | 0.5 | Score returned for items with no timestamp. Default 0.5 is neutral: it neither rewards nor penalizes missing timestamps. |
nullTimestampScore MUST be in [0.0, 1.0]. Implementations SHOULD reject construction if the value is outside this range.
Edge Cases
| Condition | Result |
|---|---|
item.timestamp is null | config.nullTimestampScore |
Item is future-dated (timestamp > now) | Age clamps to duration_zero; score = APPLY-CURVE(0, config.curve) |
| Age is exactly zero (freshest possible) | Exponential: 1.0; Step: first window’s score (if window.maxAge > 0); Window: 1.0 (since 0 < maxAge) |
age == maxAge for Window curve | 0.0 — boundary is half-open [0, maxAge) |
| Step curve: age exceeds all boundaries | Last entry’s score |
| Exponential with very large age | Approaches 0.0; never negative |
Conformance Vector Outlines
All scenarios use a fixed referenceTime = 2025-01-01T12:00:00Z (set via the injected TimeProvider).
-
Exponential, one-half-life age: An item with
timestamp = 2024-12-31T12:00:00Z(age = 24 h) andcurve = Exponential(halfLife: 24h)— the scorer returns 0.5 (2^(−1) = 0.5). -
Future-dated item: An item with
timestamp = 2025-01-02T00:00:00Z(timestamp is after referenceTime; age clamps to zero) andcurve = Exponential(halfLife: 24h)— the scorer returns the same score as an age-zero item, i.e., 1.0. -
Null timestamp: An item with no timestamp and
nullTimestampScore = 0.5— the scorer returns 0.5 regardless of the configured curve. -
Step curve, second window: An item with
timestamp = 2025-01-01T06:00:00Z(age = 6 h) andcurve = Step(windows: [(1h, 0.9), (24h, 0.5), (72h, 0.1)])— age 6h is not less than 1h (1h > 6his false), but is less than 24h (24h > 6his true) — the scorer returns 0.5 (second window’s score). -
Window curve, age exactly at maxAge: An item with
timestamp = 2025-01-01T06:00:00Z(age = 6 h) andcurve = Window(maxAge: 6h)— sinceage >= maxAge(6h >= 6h), the scorer returns 0.0.
Complexity
- Time: O(1) per item for Exponential and Window; O(W) per item for Step, where W is the number of windows.
- Space: O(1) auxiliary per invocation.
Conformance Notes
- The
allItemsparameter MUST be accepted by the scorer interface but MUST be ignored. Implementations MUST NOT iterate or inspectallItems. - Negative age (future-dated items) MUST be clamped to
duration_zerobefore being passed to the curve. The clamp MUST occur before the curve computation, not inside each curve. nullTimestampScoredefaults to 0.5. The neutral default is intentional: it avoids penalizing items that legitimately have no timestamp (e.g., synthetic items injected at pipeline construction time) while also not rewarding them above fresher real items.- Curve constructors MUST validate preconditions at construction time, not at scoring time. A scorer with an invalid curve configuration MUST NOT be created.
- In .NET,
halfLifeandmaxAgeareTimeSpan; comparisons useTimeSpan.Zero. In Rust, they areDuration; comparisons useDuration::ZERO. - The Rust
TimeProvidertrait MUST beSend + Syncto allowDecayScorerto be used across thread boundaries. The stored provider isBox<dyn TimeProvider + Send + Sync>.
MetadataKeyScorer
The MetadataKeyScorer returns a configurable multiplier (boost) when a metadata key matches a configured value, and 1.0 (a neutral multiplier) otherwise.
Overview
MetadataKeyScorer is an absolute scorer — it examines only the item’s metadata map. The allItems parameter is ignored. This scorer enables callers to boost items that carry a specific metadata signal by returning a multiplier value rather than an absolute score.
Multiplicative semantics: Unlike MetadataTrustScorer (which is an absolute passthrough returning a score directly), MetadataKeyScorer returns a multiplier value intended for use in a CompositeScorer. A match returns config.boost (e.g., 1.5); a non-match returns 1.0 (the neutral multiplier). Returned values are NOT clamped to [0.0, 1.0].
Fields Used
| Field | Source | Purpose |
|---|---|---|
metadata | ContextItem | Key-value map; the scorer checks the configured key for the configured value |
Metadata Namespace Reservation
All ContextItem.metadata keys with the "cupel:" prefix are reserved for library-defined conventions. Callers MUST NOT use this prefix for application-specific keys. This reservation is enforced at the spec level only — there is no runtime validation of key prefixes.
Conventions
cupel:priority
cupel:priority is a caller-provided priority signal for the item.
- Key:
"cupel:priority" - RECOMMENDED values:
"high","normal"(or"low") - Open string: implementations MUST NOT validate or reject unknown values. Callers MAY use any string value. Values are not validated at construction time or scoring time.
- Typical usage:
MetadataKeyScorer("cupel:priority", "high", 1.5)boosts high-priority items 1.5× in a composite scorer, leaving all other items at 1.0 (neutral multiplier).
Other Reserved Conventions
See MetadataTrustScorer for the cupel:trust and cupel:source-type conventions, which are the other reserved metadata keys in this library.
Configuration
Constructor Parameters
| Parameter | Type | Constraint | Description |
|---|---|---|---|
key | string | (none) | The metadata key to match |
value | string | (none) | The exact value to match (string-to-string comparison) |
boost | float64 | > 0.0 | The multiplier returned for matching items |
Validation
boostMUST be greater than 0.0.- A
boostvalue of 0.0 MUST be rejected at construction time withCupelError::ScorerConfig(Rust) /ArgumentException(notArgumentOutOfRangeException) (.NET). - A negative
boostvalue MUST be rejected at construction time withCupelError::ScorerConfig(Rust) /ArgumentException(.NET). - A non-finite
boostvalue (NaN or infinity) MUST be rejected at construction time withCupelError::ScorerConfig(Rust) /ArgumentException(.NET). - Validation occurs at construction time, not scoring time.
Algorithm
METADATA-KEY-SCORE(item, allItems, config):
if item.metadata does not contain config.key:
return config.defaultMultiplier // 1.0
raw <- item.metadata[config.key]
if raw != config.value:
return config.defaultMultiplier // 1.0
return config.boost
config.defaultMultiplier is a fixed constant of 1.0. It is NOT a constructor parameter — the neutral multiplier value is always 1.0. The scorer does NOT receive item.score as input; it returns a multiplier value used by the downstream scoring stage.
Score Interpretation
MetadataKeyScorer uses multiplicative semantics: it returns a multiplier, not an absolute score.
- A matching item returns
config.boost(e.g.,1.5). - A non-matching item returns
1.0(neutral — has no effect when multiplied). - When used in a
CompositeScorer, the composite multiplies the contributions of its child scorers. AMetadataKeyScorerwithboost=1.5effectively increases the composite weight for matching items by 1.5× relative to non-matching items. - Returned values are NOT clamped to [0.0, 1.0].
Edge Cases
| Condition | Result |
|---|---|
Key absent from item.metadata | 1.0 (default multiplier) |
Key present, value matches config.value | config.boost |
Key present, value does not match config.value | 1.0 (default multiplier) |
boost = 0.0 at construction | Construction error |
boost < 0.0 at construction | Construction error |
boost is NaN or Infinity at construction | Construction error |
Conformance Vector Outlines
The following scenarios outline the expected behavior for conformance testing.
-
Match → boost applied: An item with
metadata["cupel:priority"] = "high", scorer configured withkey="cupel:priority",value="high",boost=1.5— the scorer returns1.5. -
No-match → neutral: An item with
metadata["cupel:priority"] = "normal", scorer configured withkey="cupel:priority",value="high",boost=1.5— the scorer returns1.0. -
Missing key → neutral: An item with no
cupel:prioritykey in its metadata, scorer configured forkey="cupel:priority"— the scorer returns1.0regardless ofboost. -
Zero boost → construction error: Constructing
MetadataKeyScorer("cupel:priority", "high", 0.0)MUST fail withCupelError::ScorerConfig(Rust) /ArgumentException(.NET); the scorer MUST NOT be usable. -
Negative boost → construction error: Constructing
MetadataKeyScorer("cupel:priority", "high", -1.0)MUST fail withCupelError::ScorerConfig(Rust) /ArgumentException(.NET); the scorer MUST NOT be usable.
Complexity
- Time: O(1) per item (hash map lookup, string comparison).
- Space: O(1) auxiliary.
Conformance Notes
- Value comparison is string-to-string. The
config.valueis compared directly to the raw string stored atconfig.keyinitem.metadata. No parsing, normalization, or type coercion is applied. - In .NET,
metadatavalues stored as non-string types SHOULD be converted to string for comparison. The comparison target is alwaysconfig.value, which is a string. config.defaultMultiplieris always1.0and is NOT a constructor parameter. It is a fixed constant of the scorer, not a configurable default.boostvalidation occurs at construction time. Scoring-time inputs (key absence, value mismatch) never produce errors — they return1.0.- The construction error type is
ArgumentExceptionin .NET — NOTArgumentOutOfRangeException(D178).
Slicers
Slicers select a subset of scored items that fits within the token budget. They are invoked during Stage 5: Slice.
Slicer Interface
A slicer implements a single function:
Slice(scoredItems: list of ScoredItem, budget: ContextBudget) -> list of ContextItem
scoredItemsis the list of ScoredItem pairs, pre-sorted by score descending (from the Sort stage).budgetis a ContextBudget carrying the effective budget (already adjusted for pinned items and output reserve; see effective budget computation).- The return value is the list of selected ContextItem instances (without scores).
Contract
- Subset selection. The slicer MUST return a subset of the input items. It MUST NOT create, modify, or duplicate items.
- Budget compliance. The total tokens of selected items SHOULD NOT exceed
budget.targetTokens. (Some strategies like KnapsackSlice may slightly under-fill due to discretization.) - Zero-token items. Items with
tokens = 0are conventionally always included, as they consume no budget. Individual slicer implementations define this behavior. - Empty input. If
scoredItemsis empty orbudget.targetTokens <= 0, the slicer MUST return an empty list.
Slicer Summary
| Slicer | Algorithm | Complexity | Use Case |
|---|---|---|---|
| GreedySlice | Value-density greedy fill | O(N log N) | Fast, good-enough selection for most workloads |
| KnapsackSlice | 0/1 knapsack dynamic programming | O(N * C) | Optimal selection when budget is tight |
| QuotaSlice | Partitioned delegation by kind | O(N + per-kind cost) | Budget fairness across context kinds |
| CountQuotaSlice | Count-based require/cap per kind | O(N log N + inner cost) | Absolute item-count guarantees per kind |
| CountConstrainedKnapsackSlice | Count-require/cap + knapsack-optimal selection | O(N log N + N × C) | Count guarantees with globally optimal token packing |
Where N is the number of scored items and C is the discretized budget capacity.
StreamSlice (Implementation-Optional)
StreamSlice is an asynchronous/streaming slicer that processes items as they arrive rather than requiring the full sorted list upfront. It is documented here for completeness but is implementation-optional — the synchronous pipeline defined by this specification does not require streaming support.
Implementations that support asynchronous or streaming pipelines MAY provide a StreamSlice that conforms to the same behavioral contract (subset selection within budget) using a streaming evaluation model. The synchronous conformance test suite does not test StreamSlice.
GreedySlice
GreedySlice selects items by value density (score per token), greedily filling the budget from highest-density items first.
Overview
GreedySlice is the simplest and fastest slicer strategy. It computes a “value density” for each item — the ratio of relevance score to token cost — then iterates items in descending density order, including each item if it fits within the remaining budget. Zero-token items have infinite density and are always included.
This is the fractional knapsack heuristic applied to a 0/1 selection problem. It does not guarantee optimal selection (the KnapsackSlice provides that), but it is fast and produces good results in practice.
Algorithm
GREEDY-SLICE(scoredItems, budget):
if length(scoredItems) = 0 or budget.targetTokens <= 0:
return []
// Step 1: Compute value densities
densities <- new array of (density: float64, index: integer) of length(scoredItems)
for i <- 0 to length(scoredItems) - 1:
tokens <- scoredItems[i].item.tokens
if tokens = 0:
densities[i] <- (MAX_FLOAT, i)
else:
densities[i] <- (scoredItems[i].score / tokens, i)
// Step 2: Sort by density descending, tiebreak by index ascending
STABLE-SORT(densities, by density descending, then by index ascending)
// Step 3: Greedy fill
remaining <- budget.targetTokens
selected <- empty list
for i <- 0 to length(densities) - 1:
originalIndex <- densities[i].index
item <- scoredItems[originalIndex].item
tokens <- item.tokens
if tokens = 0:
APPEND(selected, item)
else if tokens <= remaining:
APPEND(selected, item)
remaining <- remaining - tokens
return selected
Value Density
Value density measures how much relevance score an item provides per token consumed:
density = score / tokens
- Items with
tokens = 0have densityMAX_FLOAT(the largest representable finite double), ensuring they sort first and are always included. Among zero-token items, all share the same densityMAX_FLOAT. The sort tiebreak is index only — score values are irrelevant for zero-token items. - Items with high scores and low token counts are preferred over items with moderate scores and high token counts.
Deterministic Tie-Break Contract
When two items have equal value density, the item with the lower original index in the input list MUST be preferred (original-index ascending). This is achieved either by using a stable sort or by including the original index as an explicit secondary sort key (density descending, then original index ascending).
This contract guarantees that GreedySlice produces identical output for identical inputs across repeated invocations. Budget-simulation methods (GetMarginalItems, FindMinBudgetFor) depend on this determinism to produce meaningful diff and binary-search results.
Among zero-token items — all sharing density MAX_FLOAT — score values MUST NOT affect relative order. The tiebreak is original index only.
Since the input is pre-sorted by score descending (from the Sort stage), this original-index tiebreaking preserves score order among equal-density items as a natural consequence.
Edge Cases
| Condition | Result |
|---|---|
Empty scoredItems | Empty list |
budget.targetTokens <= 0 | Empty list |
| All items are zero-token | All items selected |
| Single item fits within budget | That item is selected |
| Single item exceeds budget | Empty list (only zero-token items, if any) |
| All items exceed budget individually | Only zero-token items are selected |
| All items fit within budget | All items selected |
Complexity
- Time: O(N log N) — dominated by the sort. The greedy fill pass is O(N).
- Space: O(N) for the density array.
Conformance Notes
- The density for zero-token items MUST be treated as the maximum possible value, ensuring they are always sorted first and always included.
- Among zero-token items, tiebreaking is by original index only; score values MUST NOT affect the relative order of zero-token items.
- The sort MUST use original-index-ascending tiebreaking for equal densities. Given items A (density 0.5, index 0) and B (density 0.5, index 3), A must precede B. This is the deterministic tie-break contract that budget simulation depends on.
- Items that do not fit (
tokens > remaining) are skipped, not deferred. The algorithm makes a single pass through the sorted densities — it does not backtrack or attempt smaller items after skipping a large one. - The output list order is determined by the greedy iteration order (density-descending), not the original input order. The Place stage handles final ordering.
KnapsackSlice
KnapsackSlice selects items using 0/1 knapsack dynamic programming, finding the combination of items that maximizes total score within the token budget.
Overview
KnapsackSlice provides optimal or near-optimal item selection by solving the classic 0/1 knapsack problem. Items are the “objects” with token counts as weights and relevance scores as values. The algorithm finds the subset with maximum total value that fits within the budget.
To make the DP feasible for large budgets, KnapsackSlice discretizes both item weights and budget capacity using a configurable bucket size. This trades precision for performance — the result is near-optimal rather than exactly optimal.
Algorithm
KNAPSACK-SLICE(scoredItems, budget, bucketSize):
if length(scoredItems) = 0 or budget.targetTokens <= 0:
return []
// Step 1: Pre-filter zero-token items (always included)
zeroTokenItems <- empty list
candidates <- empty list
for i <- 0 to length(scoredItems) - 1:
if scoredItems[i].item.tokens = 0:
APPEND(zeroTokenItems, scoredItems[i].item)
else if scoredItems[i].item.tokens > 0:
APPEND(candidates, scoredItems[i])
if length(candidates) = 0:
return zeroTokenItems
n <- length(candidates)
// Step 2: Build parallel arrays
weights <- new array of integer[n] // original token counts
values <- new array of integer[n] // scaled integer scores
items <- new array of ContextItem[n]
for i <- 0 to n - 1:
weights[i] <- candidates[i].item.tokens
values[i] <- max(0, floor(candidates[i].score * 10000))
items[i] <- candidates[i].item
// Step 3: Discretize capacity and weights
capacity <- floor(budget.targetTokens / bucketSize)
if capacity = 0:
return zeroTokenItems
discretizedWeights <- new array of integer[n]
for i <- 0 to n - 1:
discretizedWeights[i] <- ceil(weights[i] / bucketSize)
// Step 4: DP with 1D value array + 2D keep table
dp <- new array of integer[capacity + 1], all zeros
keep <- new 2D array of boolean[n][capacity + 1], all false
for i <- 0 to n - 1:
dw <- discretizedWeights[i]
dv <- values[i]
for w <- capacity downto dw:
withItem <- dp[w - dw] + dv
if withItem > dp[w]:
dp[w] <- withItem
keep[i][w] <- true
// Step 5: Reconstruct solution
selected <- empty list
remainingCapacity <- capacity
for i <- n - 1 downto 0:
if keep[i][remainingCapacity]:
APPEND(selected, items[i])
remainingCapacity <- remainingCapacity - discretizedWeights[i]
// Step 6: Combine zero-token items with selected items
result <- empty list
for i <- 0 to length(zeroTokenItems) - 1:
APPEND(result, zeroTokenItems[i])
for i <- 0 to length(selected) - 1:
APPEND(result, selected[i])
return result
Discretization
Score Scaling
Scores (float64 in [0.0, 1.0]) are converted to integers for the DP table:
integerValue = max(0, floor(score * 10000))
Implementation note: For non-negative scores (which all Cupel scores are, since scorers return values in [0.0, 1.0]),
floorand truncation-toward-zero (C-style integer truncation) produce identical results. Either operation is conformant.
This provides 4 decimal digits of score precision. Items with very small positive scores (< 0.0001) are treated as value 0.
Weight Discretization
Item token counts and budget capacity are discretized using the bucket size:
- Item weights use ceiling division:
ceil(tokens / bucketSize). This ensures the DP never selects items that would exceed the original budget. - Budget capacity uses floor division:
floor(targetTokens / bucketSize). This ensures the discretized capacity does not exceed the original budget.
The asymmetry (ceiling for weights, floor for capacity) is deliberate — it guarantees that any solution feasible in the discretized problem is also feasible in the original problem. The tradeoff is that the discretized problem may under-fill the budget.
Bucket Size
The bucketSize parameter (default: 100) controls the granularity of discretization:
- Smaller bucket size = more precise but larger DP table (more memory and time).
- Larger bucket size = faster but coarser approximation.
bucketSizeMUST be a positive integer (> 0).
Edge Cases
| Condition | Result |
|---|---|
Empty scoredItems | Empty list |
budget.targetTokens <= 0 | Empty list |
| All items are zero-token | All items selected |
capacity = 0 after discretization | Only zero-token items |
| All items exceed discretized capacity | Only zero-token items |
| Single candidate fits | That candidate plus zero-token items |
Complexity
- Time: O(N * C) where N is the number of non-zero-token candidates and C =
floor(targetTokens / bucketSize). - Space: O(N * C) for the keep table. The 1D DP array uses O(C) additional space.
Precision Caveat (P5)
The score scaling factor (10000) and bucket size are implementation-defined parameters. This specification defines them as defaults for the reference implementation, but conforming implementations MAY use different values provided:
- The behavioral contract holds: the slicer returns a subset of items that fits within the budget.
- The selection is optimal or near-optimal for the discretization granularity used.
- Conformance tests use score differences of at least 0.01 (100 integer units at scale factor 10000) to avoid precision-dependent divergence.
Implementations that change the score scaling factor or bucket size SHOULD document the deviation.
Conformance Notes
- Zero-token items MUST be pre-filtered and always included in the result. They do not participate in the DP.
- Items with negative token counts are not expected in the input (they are excluded during classification), but if present, they SHOULD be excluded from the DP candidates.
- The DP uses reverse iteration on the weight dimension (
w <- capacity downto dw) to implement the standard 1D space optimization for 0/1 knapsack. - Solution reconstruction iterates candidates in reverse order (
i <- n-1 downto 0), which is the standard backtracking direction for 0/1 knapsack with a keep table. - The output list order is: zero-token items first, then DP-selected items in reconstruction order. The Place stage handles final ordering.
QuotaSlice
QuotaSlice partitions items by ContextKind, distributes the token budget across kinds using configurable quotas, and delegates per-kind selection to an inner slicer.
Overview
QuotaSlice is a decorator slicer — it wraps another slicer (e.g., GreedySlice or KnapsackSlice) and adds budget fairness across context kinds. This ensures that no single kind dominates the context window, and that important kinds receive a guaranteed minimum allocation.
QuotaSlice operates in four phases:
- Partition items by ContextKind
- Compute candidate token mass per kind
- Distribute the budget across kinds (require/cap constraints)
- Delegate per-kind selection to the inner slicer
Configuration
QuotaSlice is configured with a QuotaSet that defines two constraints per kind:
- Require(kind, minPercent) — Guarantees that at least
minPercent% of the total budget is reserved for this kind. Range: [0.0, 100.0]. - Cap(kind, maxPercent) — Limits this kind to at most
maxPercent% of the total budget. Range: [0.0, 100.0].
Validation Rules
- For each kind:
require <= cap. - The sum of all
requirepercentages MUST NOT exceed 100%. - Kinds without explicit configuration default to
require = 0%andcap = 100%.
Algorithm
QUOTA-SLICE(scoredItems, budget, quotas, innerSlicer):
if length(scoredItems) = 0 or budget.targetTokens <= 0:
return []
targetTokens <- budget.targetTokens
// Phase 1: Partition by ContextKind
partitions <- empty map of ContextKind -> list of ScoredItem
for i <- 0 to length(scoredItems) - 1:
kind <- scoredItems[i].item.kind // case-insensitive
if kind is not in partitions:
partitions[kind] <- empty list
APPEND(partitions[kind], scoredItems[i])
// Phase 2: Candidate token mass per kind
candidateTokenMass <- empty map of ContextKind -> integer
for each (kind, items) in partitions:
mass <- 0
for i <- 0 to length(items) - 1:
mass <- mass + items[i].item.tokens
candidateTokenMass[kind] <- mass
// Phase 3: Budget distribution
kindBudgets <- DISTRIBUTE-BUDGET(partitions, candidateTokenMass,
targetTokens, quotas)
// Phase 4: Per-kind slicing
allSelected <- empty list
for each (kind, items) in partitions:
kindBudget <- kindBudgets[kind] // 0 if not present
if kindBudget <= 0:
continue
cap <- floor(quotas.getCap(kind) / 100.0 * targetTokens)
subBudget <- ContextBudget(maxTokens: cap, targetTokens: kindBudget)
selected <- innerSlicer.Slice(items, subBudget)
for i <- 0 to length(selected) - 1:
APPEND(allSelected, selected[i])
return allSelected
Budget Distribution
DISTRIBUTE-BUDGET(partitions, candidateTokenMass, targetTokens, quotas):
// Step 1: Compute require and cap token amounts
requireTokens <- empty map of ContextKind -> integer
capTokens <- empty map of ContextKind -> integer
for each kind in quotas.configuredKinds:
requireTokens[kind] <- floor(quotas.getRequire(kind) / 100.0 * targetTokens)
capTokens[kind] <- floor(quotas.getCap(kind) / 100.0 * targetTokens)
// Step 2: Sum required tokens
totalRequired <- 0
for each (kind, req) in requireTokens:
totalRequired <- totalRequired + req
unassignedBudget <- max(0, targetTokens - totalRequired)
// Step 3: Compute distribution mass (kinds that can receive beyond their require)
totalMassForDistribution <- 0
for each (kind, items) in partitions:
cap <- capTokens[kind] if kind in capTokens, else targetTokens
require <- requireTokens[kind] if kind in requireTokens, else 0
if cap > require:
totalMassForDistribution <- totalMassForDistribution + candidateTokenMass[kind]
// Step 4: Distribute per kind
kindBudgets <- empty map of ContextKind -> integer
for each (kind, items) in partitions:
require <- requireTokens[kind] if kind in requireTokens, else 0
cap <- capTokens[kind] if kind in capTokens, else targetTokens
proportional <- 0
if totalMassForDistribution > 0 and cap > require:
proportional <- floor(unassignedBudget * candidateTokenMass[kind]
/ totalMassForDistribution)
kindBudget <- require + proportional
if kindBudget > cap:
kindBudget <- cap
kindBudgets[kind] <- kindBudget
return kindBudgets
Budget Distribution Rounding (P6)
All percentage-to-token conversions use floor truncation (integer truncation toward zero):
requireTokens = floor(requirePercent / 100.0 * targetTokens)
capTokens = floor(capPercent / 100.0 * targetTokens)
proportional = floor(unassigned * mass / totalMass)
Because of this floor truncation, the sum of all kind budgets may be less than targetTokens. This is by design — it ensures that no kind receives more tokens than its allocation warrants, at the cost of potentially leaving a small number of tokens unallocated.
Example
Given targetTokens = 1000 and two kinds each with require = 33%:
requireTokens["A"] = floor(0.33 * 1000) = 330
requireTokens["B"] = floor(0.33 * 1000) = 330
totalRequired = 660
// 340 tokens remain for proportional distribution
// Even after distribution, sum of kind budgets may be < 1000
ContextKind Comparison (P4)
All ContextKind comparisons in QuotaSlice — partition key grouping, require/cap lookups, and kind budget assignment — MUST use case-insensitive ASCII case folding, consistent with the ContextKind comparison semantics.
Edge Cases
| Condition | Result |
|---|---|
Empty scoredItems | Empty list |
budget.targetTokens <= 0 | Empty list |
| No quotas configured | All kinds get proportional budget (no require/cap constraints) |
Kind has require = 100% | That kind gets the entire budget; all other kinds get 0 |
Kind has cap = 0% | That kind is excluded (budget = 0) |
| Kind has candidates but zero token mass | Gets its require allocation (if any), no proportional share |
| Kind has no candidates | Not present in partitions; its require allocation is unused |
Complexity
- Time: O(N) for partitioning + O(K) for budget distribution + sum of inner slicer costs per partition, where K is the number of distinct kinds.
- Space: O(N) for partitioned item lists + O(K) for budget maps.
Conformance Notes
- The inner slicer receives a
ContextBudgetwithmaxTokensset to the kind’s cap (in tokens) andtargetTokensset to the kind’s computed budget. Other budget fields are not forwarded. - ContextKind comparison MUST be case-insensitive throughout (partitioning, quota lookups, budget assignment).
- Kinds not present in the quota configuration default to
require = 0%,cap = 100%. They participate in proportional distribution with no floor guarantee and no ceiling (other than the full budget). - The proportional distribution uses
candidateTokenMass(sum of token counts for actual candidates in each kind), not the theoretical maximum. This means kinds with fewer/smaller candidates naturally receive less budget. - The output list order is: items from each kind partition concatenated in partition iteration order. The Place stage handles final ordering.
CountQuotaSlice
CountQuotaSlice enforces absolute minimum and maximum item counts per ContextKind using a two-phase COUNT-DISTRIBUTE-BUDGET algorithm, then delegates residual selection to an inner slicer.
Overview
CountQuotaSlice is a decorator slicer — it wraps another slicer (e.g., GreedySlice) and adds count-based fairness across context kinds. Where QuotaSlice distributes a token budget as percentages across kinds, CountQuotaSlice enforces absolute item counts: “at least 2 tool items, at most 5 tool items.”
CountQuotaSlice operates in three phases:
- Count-Satisfy (Phase 1): For each kind with a
requireCount > 0, commit the top-N candidates (by score descending) to the selection. Their token cost is accumulated aspreAllocatedTokensand the committed items are removed from the residual candidate pool. - Budget-Distribute (Phase 2): The inner slicer receives the residual candidate pool and a reduced budget (
targetTokensreduced bypreAllocatedTokens). - Cap Enforcement (Phase 3): Items returned by the inner slicer are filtered against the per-kind
capCount. Items that would exceed the cap are excluded.
Configuration
CountQuotaSlice is configured with a list of CountQuotaEntry values and a ScarcityBehavior:
CountQuotaEntry
Each entry specifies constraints for one ContextKind:
| Field | Type | Description |
|---|---|---|
kind | ContextKind | The kind this entry constrains |
requireCount | integer ≥ 0 | Minimum number of items of this kind to commit in Phase 1 |
capCount | integer > 0 (or 0 when requireCount is also 0) | Maximum number of items of this kind in the final result |
Validation Rules
requireCountMUST be ≤capCount.capCount = 0withrequireCount > 0is rejected — a zero cap with a positive requirement can never be satisfied.- Kinds without an entry have no require or cap constraint — they participate freely in Phase 2 delegation.
ScarcityBehavior
Controls what happens when the candidate pool has fewer items of a kind than the configured requireCount:
| Value | Behavior |
|---|---|
Degrade (default) | Include all available candidates and record a shortfall. Pipeline execution continues. |
Throw | Return an error / throw an exception immediately. Use when count requirements are hard guarantees (e.g., required disclaimer text). |
KnapsackSlice Guard
CountQuotaSlice does not support KnapsackSlice as the inner slicer. Construction MUST fail with an error if the inner slicer is a KnapsackSlice instance. Use GreedySlice as the inner slicer instead. A CountConstrainedKnapsackSlice may be provided in a future release.
Algorithm
COUNT-QUOTA-SLICE(scoredItems, budget, entries, innerSlicer, scarcity):
if length(scoredItems) = 0 or budget.targetTokens <= 0:
return []
// Build policy lookup: kind -> (requireCount, capCount)
entryByKind <- map from entries
// --- Phase 1: Count-Satisfy ---
// Partition candidates by ContextKind
partitions <- group scoredItems by item.kind (case-insensitive)
// Sort each partition by score descending
for each (kind, candidates) in partitions:
SORT(candidates, by score descending)
committed <- empty list
committedSet <- empty set (reference equality)
selectedCount <- empty map of ContextKind -> integer
preAllocatedTokens <- 0
shortfalls <- empty list
for each entry in entries where entry.requireCount > 0:
candidates <- partitions[entry.kind] (empty if absent)
satisfied <- 0
for each candidate in candidates while satisfied < entry.requireCount:
APPEND(committed, candidate.item)
ADD(committedSet, candidate.item)
preAllocatedTokens <- preAllocatedTokens + candidate.item.tokens
satisfied <- satisfied + 1
selectedCount[entry.kind] <- satisfied
if satisfied < entry.requireCount:
if scarcity = Throw:
ERROR("kind has fewer candidates than requireCount")
else:
RECORD-SHORTFALL(entry.kind, entry.requireCount, satisfied)
// --- Phase 2: Budget-Distribute ---
residual <- [si in scoredItems where si.item not in committedSet]
residualTarget <- max(0, budget.targetTokens - preAllocatedTokens)
residualBudget <- ContextBudget(maxTokens: budget.maxTokens,
targetTokens: min(residualTarget, budget.maxTokens))
innerSelected <- innerSlicer.Slice(residual, residualBudget)
// --- Phase 3: Cap Enforcement ---
result <- committed
for each item in innerSelected:
kind <- item.kind
count <- selectedCount[kind] (default 0)
cap <- entryByKind[kind].capCount (uncapped if no entry)
if cap is defined and count >= cap:
EXCLUDE(item) // cap exceeded
else:
APPEND(result, item)
selectedCount[kind] <- count + 1
return result
Scarcity Reporting
When ScarcityBehavior.Degrade is active and a kind’s candidate pool cannot satisfy its requireCount, a CountRequirementShortfall is recorded:
| Field | Type | Description |
|---|---|---|
kind | ContextKind (or string) | The kind that was short |
requiredCount | integer | The configured requireCount |
satisfiedCount | integer | How many candidates were actually committed |
In .NET, shortfalls are available via CountQuotaSlice.LastShortfalls (populated after each Slice call). In Rust, shortfalls are recorded on SelectionReport.count_requirement_shortfalls (populated by the pipeline after slicing).
Monotonicity
CountQuotaSlice does not guarantee monotonic item inclusion as the budget changes. Phase 1 commits a fixed set of items regardless of budget, but Phase 2 delegation produces different residual selections at different budgets. Combined with the cap enforcement in Phase 3, the final set can exhibit non-monotonic inclusion.
This means CountQuotaSlice is incompatible with budget simulation methods that assume monotonic inclusion:
GetMarginalItems— does not guard against CountQuotaSlice (only guards against QuotaSlice), because Phase 1 pre-allocation is budget-independent and the inner slicer controls the residual. However, the cap enforcement can cause non-monotonic behavior.FindMinBudgetFor— MUST throw if the pipeline’s slicer is CountQuotaSlice (see FindMinBudgetFor QuotaSlice + CountQuotaSlice Guard).
Edge Cases
| Condition | Result |
|---|---|
Empty scoredItems | Empty list |
budget.targetTokens <= 0 | Empty list |
| No entries configured | Phase 1 commits nothing; Phase 2 runs unconstrained |
requireCount = capCount | Phase 1 commits exactly requireCount items; Phase 2 cannot add more of that kind |
requireCount = 0, capCount > 0 | Phase 1 commits nothing for that kind; Phase 2 output is capped |
Kind has fewer candidates than requireCount | Scarcity behavior applies (Degrade or Throw) |
| Phase 1 exhausts the token budget | Phase 2 receives residualTarget = 0 and selects nothing |
| Inner slicer is KnapsackSlice | Construction fails with error |
Complexity
- Time: O(N) for partitioning + O(K · M log M) for per-kind sorting in Phase 1 (where M is the largest partition size) + inner slicer cost for Phase 2 + O(I) for Phase 3 cap enforcement (where I is the inner slicer output size).
- Space: O(N) for partitioned item lists + O(K) for policy maps and count tracking.
Conformance Notes
- ContextKind comparison MUST be case-insensitive throughout (partitioning and policy lookups), consistent with the ContextKind comparison semantics.
- Phase 1 commits items in score-descending order within each kind. The input is pre-sorted by the pipeline, but implementations MUST sort per-kind partitions explicitly to ensure correctness when multiple kinds interleave.
- The inner slicer in Phase 2 receives a
ContextBudgetwithtargetTokensreduced by Phase 1 pre-allocation.maxTokensis passed through unchanged from the original budget. - Phase 3 cap enforcement applies only to items returned by the inner slicer in Phase 2. Phase 1 committed items are always included (they count toward the cap but are never excluded by it).
- Kinds not present in the entry list have no require or cap constraint. They are not partitioned in Phase 1 and pass through Phase 2 and 3 without cap filtering.
- The KnapsackSlice guard MUST be enforced at construction time, not at slice time.
CountConstrainedKnapsackSlice
CountConstrainedKnapsackSlice enforces absolute minimum and maximum item counts per ContextKind while using knapsack-optimal selection for the residual budget, implementing a 3-phase COUNT-KNAPSACK-CAP algorithm.
Overview
CountConstrainedKnapsackSlice is a decorator slicer — it wraps a KnapsackSlice directly and adds count-based fairness across context kinds. Unlike CountQuotaSlice, which delegates to a configurable inner slicer and guards against KnapsackSlice at construction time, CountConstrainedKnapsackSlice hardwires KnapsackSlice as the Phase 2 engine — it is the knapsack wrapper itself.
CountConstrainedKnapsackSlice operates in three phases:
- Count-Satisfy (Phase 1): Identical to CountQuotaSlice Phase 1. For each kind with a
requireCount > 0, commit the top-N candidates (by score descending) to the selection. Their token cost is accumulated aspreAllocatedTokensand the committed items are removed from the residual candidate pool. - Knapsack-Distribute (Phase 2): The residual candidate pool is passed to the stored
KnapsackSlicewith a reduced budget (targetTokensreduced bypreAllocatedTokens). The Phase 2 output is then re-sorted by score descending before Phase 3 proceeds (D180 — required because KnapsackSlice output order is not guaranteed to be score-descending). - Cap Enforcement (Phase 3): Items returned by Phase 2 (after re-sort) are filtered against the per-kind
capCount. Items that would exceed the cap are excluded. TheselectedCountmap is seeded from Phase 1 committed counts (D181 — not initialized to zero), ensuring cap tracking is correct across both phases.
Configuration
CountConstrainedKnapsackSlice is configured with a list of CountQuotaEntry values and a ScarcityBehavior:
CountQuotaEntry
Each entry specifies constraints for one ContextKind:
| Field | Type | Description |
|---|---|---|
kind | ContextKind | The kind this entry constrains |
requireCount | integer ≥ 0 | Minimum number of items of this kind to commit in Phase 1 |
capCount | integer > 0 (or 0 when requireCount is also 0) | Maximum number of items of this kind in the final result |
ScarcityBehavior
Controls what happens when the candidate pool has fewer items of a kind than the configured requireCount:
| Value | Behavior |
|---|---|
Degrade (default) | Include all available candidates and record a shortfall. Pipeline execution continues. |
Throw | Return an error / throw an exception immediately. Use when count requirements are hard guarantees (e.g., required disclaimer text). |
Construction Parameters
Rust: CountConstrainedKnapsackSlice::new(Vec<CountQuotaEntry>, KnapsackSlice, ScarcityBehavior)
.NET: CountConstrainedKnapsackSlice(IReadOnlyList<CountQuotaEntry>, KnapsackSlice, ScarcityBehavior = ScarcityBehavior.Degrade)
Validation Rules
requireCountMUST be ≤capCount.capCount = 0withrequireCount > 0is rejected — a zero cap with a positive requirement can never be satisfied.- Kinds without an entry have no require or cap constraint — they participate freely in Phase 2 delegation.
Algorithm
COUNT-KNAPSACK-CAP(scoredItems, budget, entries, knapsack, scarcity):
if length(scoredItems) = 0 or budget.targetTokens <= 0:
return []
// Build policy lookup: kind -> (requireCount, capCount)
entryByKind <- map from entries
// --- Phase 1: Count-Satisfy (identical to CountQuotaSlice Phase 1) ---
// Partition candidates by ContextKind
partitions <- group scoredItems by item.kind (case-insensitive)
// Sort each partition by score descending
for each (kind, candidates) in partitions:
SORT(candidates, by score descending)
committed <- empty list
committedSet <- empty set (reference equality)
selectedCount <- empty map of ContextKind -> integer
preAllocatedTokens <- 0
shortfalls <- empty list
for each entry in entries where entry.requireCount > 0:
candidates <- partitions[entry.kind] (empty if absent)
satisfied <- 0
for each candidate in candidates while satisfied < entry.requireCount:
APPEND(committed, candidate.item)
ADD(committedSet, candidate.item)
preAllocatedTokens <- preAllocatedTokens + candidate.item.tokens
satisfied <- satisfied + 1
selectedCount[entry.kind] <- satisfied // D181: seed Phase 3 starting counts
if satisfied < entry.requireCount:
if scarcity = Throw:
ERROR("CountConstrainedKnapsackSlice: candidate pool for kind '<kind>' has <satisfied> items but RequireCount is <requireCount>.")
else:
RECORD-SHORTFALL(entry.kind, entry.requireCount, satisfied)
// --- Phase 2: Knapsack-Distribute ---
residual <- [si in scoredItems where si.item not in committedSet]
residualTarget <- max(0, budget.targetTokens - preAllocatedTokens)
residualBudget <- ContextBudget(maxTokens: budget.maxTokens,
targetTokens: min(residualTarget, budget.maxTokens))
innerSelected <- KnapsackSlice(residual, residualBudget)
// D180: Re-sort Phase 2 output by score descending before Phase 3 cap loop.
// KnapsackSlice output order is not guaranteed; cap enforcement must see
// highest-scoring items first so cap budget is spent on the best items.
innerSelected <- SORT(innerSelected, by score descending)
// --- Phase 3: Cap Enforcement ---
result <- committed
// selectedCount is already seeded from Phase 1 (D181 — not reset to zero)
for each item in innerSelected:
kind <- item.kind
count <- selectedCount[kind] (default 0)
cap <- entryByKind[kind].capCount (uncapped if no entry)
if cap is defined and count >= cap:
EXCLUDE(item) // cap exceeded
else:
APPEND(result, item)
selectedCount[kind] <- count + 1
return result
Scarcity Reporting
When ScarcityBehavior.Degrade is active and a kind’s candidate pool cannot satisfy its requireCount, a CountRequirementShortfall is recorded:
| Field | Type | Description |
|---|---|---|
kind | ContextKind (or string) | The kind that was short |
requiredCount | integer | The configured requireCount |
satisfiedCount | integer | How many candidates were actually committed |
The exact ScarcityBehavior.Throw error message is:
"CountConstrainedKnapsackSlice: candidate pool for kind '<kind>' has <satisfied> items but RequireCount is <requireCount>."
In .NET, shortfalls are available via CountConstrainedKnapsackSlice.LastShortfalls (populated after each Slice call). In Rust, shortfalls are recorded on SelectionReport.count_requirement_shortfalls (populated by the pipeline after slicing).
Monotonicity
CountConstrainedKnapsackSlice returns is_count_quota() → true. This signals that the slicer is incompatible with budget simulation methods that require monotonic item inclusion.
FindMinBudgetForMUST guard against CountConstrainedKnapsackSlice. See the guard specification at FindMinBudgetFor QuotaSlice + CountQuotaSlice Guard.
Note: Unlike CountQuotaSlice, there is no KnapsackSlice guard at construction time. CountConstrainedKnapsackSlice is the knapsack wrapper itself — it hardwires KnapsackSlice internally and does not accept it as a configurable inner slicer.
Trade-offs
Pre-processing Sub-optimality (D174)
Phase 1 commits required items before the knapsack runs, consuming budget. If required items are token-heavy, the residual budget available to Phase 2 may be significantly reduced — potentially to zero. In the worst case, globally optimal selection is impossible because the knapsack cannot see the full candidate pool with the full budget. This is the inherent trade-off of the pre-processing approach (Path 5A): it avoids a full constrained knapsack DP (which would be NP-hard to extend with per-kind minimums) at the cost of potential sub-optimality when Phase 1 pre-allocation is large relative to the total budget.
Cap Waste
KnapsackSlice may select items in Phase 2 that Phase 3 then excludes due to cap enforcement. Token budget was “spent” on those items in the knapsack DP computation — they counted toward capacity — but they do not appear in the final result. In scenarios with a low cap and many candidates of a capped kind, Phase 2 may select several items that are subsequently excluded by Phase 3, reducing effective throughput.
Edge Cases
| Condition | Result |
|---|---|
Empty scoredItems | Empty list |
budget.targetTokens <= 0 | Empty list |
| No entries configured | Phase 1 commits nothing; Phase 2 runs unconstrained knapsack |
requireCount = capCount | Phase 1 commits exactly requireCount items; Phase 3 cannot add more of that kind |
requireCount = 0, capCount > 0 | Phase 1 commits nothing for that kind; Phase 2 output is capped |
Kind has fewer candidates than requireCount | Scarcity behavior applies (Degrade or Throw) |
| Phase 1 exhausts budget | Phase 2 receives residualTarget = 0 and selects nothing |
| No KnapsackSlice guard | N/A — CountConstrainedKnapsackSlice hardwires KnapsackSlice internally; no guard is needed at construction |
| KnapsackSlice OOM guard | CupelError::TableTooLarge propagates from KnapsackSlice when the DP table exceeds 50 M cells |
Complexity
- Time: O(N) for partitioning + O(K · M log M) for per-kind sorting in Phase 1 (where M is the largest partition size) + O(N × C) for Phase 2 KnapsackSlice DP (where C = residual capacity / bucket_size) + O(I log I) for Phase 2 re-sort + O(I) for Phase 3 cap enforcement (where I is the KnapsackSlice output size).
- Space: O(N) for partitioned item lists + O(K) for policy maps and count tracking + O(N × C) for the knapsack DP table.
Conformance Notes
- ContextKind comparison MUST be case-insensitive throughout (partitioning and policy lookups), consistent with the ContextKind comparison semantics.
- Phase 2 re-sort is normative (D180): Implementations MUST sort Phase 2 output by score descending before entering the Phase 3 cap loop. KnapsackSlice output order is undefined; without re-sorting, cap enforcement may exclude higher-scoring items and include lower-scoring ones.
- Phase 3 MUST initialize
selectedCountfrom Phase 1 committed counts (D181): Implementations MUST NOT resetselectedCountto zero before Phase 3. The Phase 1 committed counts are the correct starting state for cap tracking across both phases. - Phase 3 cap enforcement applies only to items returned by Phase 2. Phase 1 committed items are always included (they count toward the cap but are never excluded by it).
- Kinds not present in the entry list have no require or cap constraint. They are not partitioned in Phase 1 and pass through Phase 2 and Phase 3 without cap filtering.
Conformance Vector Outlines
1. Baseline — Require 2 of kind, knapsack selects residual
File: count-constrained-knapsack-baseline.toml
- Budget: 1000 tokens. 3 items, each 100 tokens.
- Policy:
require_count=2, cap_count=4for kind"tool".bucket_size=100. - Phase 1: commits
tool-a(0.9) andtool-b(0.7).preAllocatedTokens=200.selectedCount["tool"]=2. - Phase 2:
KnapsackSlicereceives residual[msg-x (0.5, 100t)]with budget 800. Selectsmsg-x. - Phase 3:
selectedCount["tool"]=2, cap=4. All Phase 2 items pass (kind"msg"is uncapped). - Expected: all 3 items selected. 0 shortfalls. 0 cap exclusions.
2. Cap Exclusion — cap=2, 4 candidates, 2 excluded by cap after Phase 2 re-sort
File: count-constrained-knapsack-cap-exclusion.toml
- Budget: 600 tokens. 4 tool items, each 100 tokens.
- Policy:
require_count=1, cap_count=2for kind"tool".bucket_size=100. - Phase 1: commits
tool-a(0.9).preAllocatedTokens=100.selectedCount["tool"]=1. - Phase 2:
KnapsackSlicereceives[tool-b (0.8), tool-c (0.7), tool-d (0.6)]with budget 500. All fit — selects all 3. Re-sort:[tool-b, tool-c, tool-d](already score-descending). - Phase 3:
selectedCount["tool"]=1, cap=2.tool-b→ count=2, pass.tool-c→ count=2 ≥ cap=2, excluded.tool-d→ excluded. - Expected:
[tool-a, tool-b]. 0 shortfalls. 2 cap exclusions.
3. Scarcity Degrade — require_count=3, only 1 candidate, shortfall recorded
File: count-constrained-knapsack-scarcity-degrade.toml
- Budget: 500 tokens. 1 item:
tool-a(0.9, 100t). - Policy:
require_count=3, cap_count=5for kind"tool".ScarcityBehavior=Degrade. - Phase 1: commits
tool-a.satisfied=1 < requireCount=3→ shortfall recorded{kind:"tool", requiredCount:3, satisfiedCount:1}.selectedCount["tool"]=1. - Phase 2: residual empty.
KnapsackSliceselects nothing. - Phase 3: no Phase 2 items.
- Expected:
[tool-a]. 1 shortfall. 0 cap exclusions.
4. Tag Non-exclusivity — two kinds with independent require constraints
File: count-constrained-knapsack-tag-nonexclusive.toml
- Budget: 1000 tokens. 3 items:
item-tool(tool, 0.9, 100t),item-memory(memory, 0.8, 100t),item-extra(tool, 0.5, 100t). - Policy:
require_count=1, cap_count=4for"tool"and"memory"independently.bucket_size=100. - Phase 1: commits
item-tool(satisfies tool),item-memory(satisfies memory).preAllocatedTokens=200.selectedCount={"tool":1, "memory":1}. - Phase 2:
KnapsackSlicereceives[item-extra (0.5, 100t)]with budget 800. Selectsitem-extra. - Phase 3:
item-extrais kind"tool", count=1 < cap=4 → pass.selectedCount["tool"]=2. - Expected: all 3 items selected. 0 shortfalls. 0 cap exclusions.
5. Require-and-Cap — require=2, cap=2, knapsack picks best msg items from residual
File: count-constrained-knapsack-require-and-cap.toml
- Budget: 1000 tokens.
bucket_size=1(exact knapsack). 5 items:tool-a(tool, 0.9, 100t),tool-b(tool, 0.7, 100t),msg-s(msg, 0.8, 50t),msg-m(msg, 0.6, 150t),msg-l(msg, 0.4, 200t). - Policy:
require_count=2, cap_count=2for"tool". No constraint on"msg". - Phase 1: commits
tool-aandtool-b.preAllocatedTokens=200.selectedCount["tool"]=2. - Phase 2:
KnapsackSlicereceives[msg-s, msg-m, msg-l]with budget 800. All three fit (50+150+200=400 < 800). Selects all 3. - Phase 3:
selectedCount["tool"]=2, cap=2. Phase 2 items are kind"msg"(uncapped) → all pass. - Expected: all 5 items selected. 0 shortfalls. 0 cap exclusions.
Placers
Placers determine the final ordering of selected items in the context window. They are invoked during Stage 6: Place.
Placer Interface
A placer implements a single function:
Place(items: list of ScoredItem) -> list of ContextItem
itemsis the merged list of ScoredItem pairs — pinned items (with score 1.0) followed by sliced items (with their computed scores). See Stage 6: Place for the merge process.- The return value is the final ordered list of ContextItem instances.
Contract
- Reorder only. The placer MUST return the same items it receives, in a potentially different order. It MUST NOT add, remove, modify, or duplicate items.
- Deterministic. Given the same input, the placer MUST produce the same output order.
- Score-aware. The placer receives scores (via ScoredItem) and may use them to determine ordering. Pinned items have score 1.0.
Placer Summary
| Placer | Algorithm | Use Case |
|---|---|---|
| ChronologicalPlacer | Stable sort ascending by timestamp | Natural conversation flow; preserves temporal ordering |
| UShapedPlacer | Highest scores at edges, lowest in middle | Exploits LLM primacy + recency attention bias |
ChronologicalPlacer
The ChronologicalPlacer orders items by ascending timestamp, preserving natural temporal flow.
Overview
ChronologicalPlacer is the simplest placer strategy. It sorts items by their timestamp field in ascending order (oldest first), producing a context window that reads in chronological order. This is the natural ordering for conversation-style contexts where temporal sequence matters.
Fields Used
| Field | Source | Purpose |
|---|---|---|
timestamp | ContextItem | Sort key for temporal ordering |
score | ScoredItem | Not used by this placer |
Algorithm
CHRONOLOGICAL-PLACE(items):
if length(items) <= 1:
if length(items) = 0:
return []
return [items[0].item]
// Build sortable array with original indices
sortable <- new array of (timestamp, index) of length(items)
for i <- 0 to length(items) - 1:
sortable[i] <- (items[i].item.timestamp, i)
// Stable sort: timestamped items ascending, then null-timestamp items, then by index
STABLE-SORT(sortable, by:
1. Items with timestamps sort before items without timestamps
2. Among timestamped items: ascending by timestamp
3. Tiebreak: ascending by original index
)
result <- new array of ContextItem[length(items)]
for i <- 0 to length(sortable) - 1:
result[i] <- items[sortable[i].index].item
return result
Sort Order Detail
The sort comparison between two items a and b:
CHRONOLOGICAL-COMPARE(a, b):
aHas <- a.timestamp is not null
bHas <- b.timestamp is not null
if aHas and bHas:
if a.timestamp != b.timestamp:
return a.timestamp < b.timestamp // ascending
return a.index < b.index // tiebreak
if aHas:
return true // a (with timestamp) sorts before b (without)
if bHas:
return false // b (with timestamp) sorts before a (without)
return a.index < b.index // both null: preserve original order
Timestamp Placement Rules
- Timestamped items sort first, in ascending temporal order.
- Null-timestamp items sort last, in original input order.
- Tied timestamps preserve original input order (stable sort).
This means the context window reads oldest-to-newest for timestamped items, with non-timestamped items appended at the end.
Edge Cases
| Condition | Result |
|---|---|
| Empty input | Empty list |
| Single item | That item returned as-is |
| All items have timestamps | Sorted ascending by timestamp |
| No items have timestamps | Original input order preserved |
| Mixed timestamped and non-timestamped | Timestamped items first (ascending), then non-timestamped (original order) |
| All items have the same timestamp | Original input order preserved (stable sort) |
Complexity
- Time: O(N log N) — dominated by the sort.
- Space: O(N) for the sortable array.
Conformance Notes
- Timestamp comparison is by temporal ordering (the underlying UTC instant), not by string representation or timezone.
- Items without timestamps MUST sort after all timestamped items, not before.
- The sort MUST be stable. When timestamps are equal (or both null), the original input order (index) MUST be preserved.
- Scores are available (items are ScoredItem pairs) but MUST NOT influence the ordering. ChronologicalPlacer is purely timestamp-based.
UShapedPlacer
The UShapedPlacer positions the highest-scored items at both edges of the context window (start and end), with the lowest-scored items in the middle.
Overview
UShapedPlacer exploits the primacy and recency bias observed in large language models — LLMs tend to pay more attention to content at the beginning and end of the context window, with less attention to content in the middle. By placing the most relevant items at both edges, UShapedPlacer maximizes the chance that important context is attended to.
The placement pattern forms a “U” shape when score is plotted against position: high at the edges, low in the center.
Fields Used
| Field | Source | Purpose |
|---|---|---|
score | ScoredItem | Determines placement priority |
Algorithm
U-SHAPED-PLACE(items):
if length(items) <= 1:
if length(items) = 0:
return []
return [items[0].item]
// Step 1: Sort by score descending, tiebreak by index ascending
scored <- new array of (score, index) of length(items)
for i <- 0 to length(items) - 1:
scored[i] <- (items[i].score, i)
STABLE-SORT(scored, by score descending, then by index ascending)
// Step 2: Alternate placement — even ranks to left edge, odd ranks to right edge
result <- new array of ContextItem[length(items)]
left <- 0 // next available left position
right <- length(items) - 1 // next available right position
for i <- 0 to length(scored) - 1:
originalIndex <- scored[i].index
item <- items[originalIndex].item
if i mod 2 = 0: // even rank (0, 2, 4, ...)
result[left] <- item
left <- left + 1
else: // odd rank (1, 3, 5, ...)
result[right] <- item
right <- right - 1
return result
Placement Pattern
Items are sorted by score descending and assigned a rank (0, 1, 2, …). The rank determines placement:
| Rank | Position | Edge |
|---|---|---|
| 0 (highest score) | Left edge (index 0) | Start |
| 1 (2nd highest) | Right edge (index N-1) | End |
| 2 (3rd highest) | Left edge (index 1) | Start |
| 3 (4th highest) | Right edge (index N-2) | End |
| 4 (5th highest) | Left edge (index 2) | Start |
| … | … | … |
Visual Example
Given 7 items sorted by score [A=0.9, B=0.8, C=0.7, D=0.6, E=0.5, F=0.4, G=0.3]:
Position: [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ]
Item: [ A ] [ C ] [ E ] [ G ] [ F ] [ D ] [ B ]
Score: [ 0.9 ] [ 0.7 ] [ 0.5 ] [ 0.3 ] [ 0.4 ] [ 0.6 ] [ 0.8 ]
Score plot: ___ ___
| \ / |
| \___ ___/ |
| \___ ___/ |
| \___ ___/ |
| \___ ___/ |
| \_/ |
start middle end
The highest scores (A, B) are at the edges. The lowest score (G) is in the center.
Sort Stability
The sort MUST be stable or use a composite sort key. When two items have the same score, the item with the lower original index is ranked first. This ensures deterministic placement when scores are tied.
Edge Cases
| Condition | Result |
|---|---|
| Empty input | Empty list |
| Single item | That item returned as-is |
| Two items | Higher-scored at index 0, lower-scored at index 1 |
| All items have the same score | Original input order preserved (stable sort); alternating placement still applies |
| Pinned items (score 1.0) | Not special-cased by UShapedPlacer; pinned items arrive with score 1.0 from the pipeline and naturally rank highly, sorting to edges. |
Complexity
- Time: O(N log N) — dominated by the sort.
- Space: O(N) for the scored array and result array.
Conformance Notes
- The sort is descending by score, with ascending index as the tiebreaker.
- Even-ranked items (0, 2, 4, …) fill from the left (start of the array).
- Odd-ranked items (1, 3, 5, …) fill from the right (end of the array).
- The
leftpointer advances forward; therightpointer advances backward. They meet in the middle. - Pinned items arrive with score 1.0 (see Stage 6: Place). They will be ranked highly and placed at the edges, which is the intended behavior — pinned items should receive maximum attention.
Diagnostics
The diagnostics system exposes pipeline selection decisions as structured, inspectable data without affecting output or performance when disabled.
Overview
Diagnostics answer the question: “why was this item included or excluded?” The pipeline instruments each stage with trace calls that record durations, item counts, and per-item decisions. When diagnostics are disabled, these calls are zero-cost. When enabled, the accumulated events form a SelectionReport that callers can inspect after the pipeline run completes.
Ownership Model
The trace collector is passed at call time, not stored on the pipeline. One collector instance is used per pipeline execution.
This makes diagnostics per-invocation: each pipeline run receives a clean diagnostic context, and different concurrent runs can use different diagnostic configurations.
Rationale: Per-invocation ownership avoids thread-safety concerns and ensures each pipeline run gets an isolated diagnostic context.
Rejected alternative: Storing the collector on the pipeline object — this couples the collector lifecycle to the pipeline instance, prevents concurrent runs with different diagnostic configurations, and turns a per-call concern into a per-instance configuration.
Null-Path Guarantee
When diagnostics are disabled (via NullTraceCollector), no event objects are constructed and no performance overhead is incurred. Implementations achieve this by checking is_enabled before constructing event payloads.
Rationale: Diagnostic instrumentation must never regress pipeline performance for callers who do not need it. The is_enabled check is the single gate that eliminates all allocation and computation on the disabled path.
Data Flow
flowchart TD
Input["Candidate Items\n(list of ContextItem)"] --> Pipeline
Pipeline["Pipeline\n(trace calls at each stage)"] --> Output["Selected Items\n(list of ContextItem)"]
Pipeline --> Collector["TraceCollector\n(buffers events)"]
Collector --> Report["SelectionReport\n(complete diagnostic output)"]
Summary
| Type | Role | Spec page |
|---|---|---|
TraceCollector | Observer contract for pipeline instrumentation | TraceCollector |
TraceEvent | Structured record of a single pipeline observation | Events |
PipelineStage | Enumeration of observable pipeline stages | Events |
OverflowEvent | Record produced when budget overflow occurs under Proceed strategy | Events |
ExclusionReason | Why an item was excluded from the context window | Exclusion Reasons |
InclusionReason | Why an item was included in the context window | Exclusion Reasons |
SelectionReport | Complete diagnostic output from a pipeline run | SelectionReport |
TraceCollector
Overview
The TraceCollector is an observer interface that pipeline implementations call at instrumentation points throughout each stage. It has two built-in implementations: NullTraceCollector (zero-cost disabled path) and DiagnosticTraceCollector (buffered recording with configurable detail level).
The pipeline calls is_enabled before constructing event payloads. When is_enabled returns false, no events are created and no work is done.
Contract
| Member | Signature (pseudocode) | Description |
|---|---|---|
is_enabled | -> boolean | Whether this collector is actively recording. Callers check before constructing event payloads to avoid unnecessary allocations. |
record_stage_event | (event: TraceEvent) -> void | Record a stage-level event. Called once per pipeline stage, after the stage completes. |
record_item_event | (event: TraceEvent) -> void | Record an item-level event. May be filtered by the collector’s configured detail level. |
NullTraceCollector
NullTraceCollector is the disabled implementation. is_enabled returns false. Both record_stage_event and record_item_event are no-ops.
A singleton instance is recommended but not required — callers that need a disabled collector can use any NullTraceCollector instance.
See Null-Path Guarantee for the rationale and implementation contract.
DiagnosticTraceCollector
DiagnosticTraceCollector is the enabled implementation. is_enabled returns true. Events are buffered in insertion order.
Construction Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
detail_level | TraceDetailLevel | No | Stage | Controls which events are recorded. |
Behavior
record_stage_event: always records the event, regardless ofdetail_level.record_item_event: records the event only whendetail_levelisItemor higher. Whendetail_levelisStage, item events are silently discarded.
After the pipeline run completes, the caller extracts a SelectionReport from the collector’s buffered events. See SelectionReport.
TraceDetailLevel
TraceDetailLevel controls the granularity of events recorded by DiagnosticTraceCollector. Values are ordered: a higher ordinal includes all events from lower ordinals.
| Value | Ordinal | Description |
|---|---|---|
Stage | 0 | Stage-level events only (durations, item counts per stage) |
Item | 1 | Stage-level events plus per-item events (individual scores, exclusion reasons) |
Rationale: Two levels provide a meaningful performance/detail trade-off. Stage-level is sufficient for performance profiling and pipeline health checks. Item-level is needed for “why was this item excluded?” debugging. A coarser distinction avoids the ambiguity of intermediate values.
Rejected alternative: A continuous verbosity integer — discrete levels are more meaningful to callers and prevent ambiguous intermediate values where behavior would be undefined.
Conformance Notes
- Implementations MUST provide both a null (disabled) and a diagnostic (enabled) collector.
- The null collector MUST return
falsefromis_enabledand MUST NOT allocate or buffer whenrecord_stage_eventorrecord_item_eventare called. record_item_eventMUST be gated by the configureddetail_level. Whendetail_levelisStage, item events MUST be discarded without recording.- Implementations MUST preserve event insertion order in the accumulated event list.
Observer Callback (Optional — MAY)
Implementations may support an optional observer callback that fires synchronously on each recorded event. The spec defines this as a capability, not a mechanism — implementations choose how to expose the callback (function pointer, closure, delegate, trait object, etc.).
Rationale: Synchronous firing ensures events are observable in real time, enabling streaming diagnostic use cases. The callback mechanism is left to implementations because callback abstractions differ fundamentally across languages and concurrency models. The spec prescribes the observable behavior (synchronous, per-event) without constraining the API surface.
Events
Overview
Two structural event types carry diagnostic information out of the pipeline: TraceEvent (produced at every instrumentation point) and OverflowEvent (produced only when budget overflow occurs under the Proceed strategy). Each pipeline stage fires exactly one stage-level TraceEvent after it completes.
TraceEvent
TraceEvent is the universal record produced at each pipeline instrumentation point. Stage-level events capture timing and throughput for a complete stage; item-level events capture decisions for individual items.
Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
stage | PipelineStage | Yes | — | The pipeline stage that produced this event. |
duration_ms | float64 | Yes | — | Wall-clock duration of the stage in milliseconds. For item-level events, this is 0.0. |
item_count | integer | Yes | — | Number of items processed by this stage. For item-level events, this is 1. (For item-level events, the value is 1, which is a sentinel indicating a single item was processed — it carries no aggregate meaning.) |
message | string | No | (absent) | Optional diagnostic message. Absent when not provided. |
Rationale for duration_ms as float64: Milliseconds provide human-readable precision for diagnostic output while remaining portable across all languages. Float64 accommodates sub-millisecond precision without integer overflow concerns and maps cleanly to IEEE 754 double in all target languages.
Rejected alternative — integer nanoseconds: Less readable in diagnostic output; requires unit conversion in most display contexts.
Rejected alternative — ISO 8601 duration strings: Harder to aggregate and compare programmatically; parsing overhead at query time.
Stage-Level Event Example
{
"stage": "Score",
"duration_ms": 12.4,
"item_count": 47
}
The message field is absent because it was not provided. Absent fields are omitted from the JSON representation.
Item-Level Event Example
{
"stage": "Slice",
"duration_ms": 0.0,
"item_count": 1,
"message": "BudgetExceeded: 2048 tokens requested, 512 available"
}
Item-level events set duration_ms to 0.0 because item processing time is not independently measurable from the stage as a whole.
PipelineStage
PipelineStage enumerates the observable stages of the pipeline. Each value corresponds to a stage that emits diagnostic events.
| Value | Description |
|---|---|
"Classify" | Classification and partitioning stage |
"Score" | Scoring stage |
"Deduplicate" | Deduplication stage |
"Slice" | Budget-fitting selection stage |
"Place" | Final ordering and merge stage |
Sort is omitted. Sort is an internal ordering step with no user-visible diagnostic boundary and no meaningful duration to report independently. Including it would add a stage that built-in pipelines emit no meaningful diagnostics for, increasing conformance burden without diagnostic value.
Implementations may emit events for internal stages not listed here. Callers should handle unknown PipelineStage values gracefully when processing diagnostic data (see Conformance Notes below).
OverflowEvent
OverflowEvent is produced when the merged result (pinned + sliced items) exceeds target_tokens and OverflowStrategy is Proceed. See OverflowStrategy.
OverflowEvent is a data type only. The spec defines its structure but not its delivery mechanism — implementations choose how to surface the event (callback, return value, collector method, etc.).
Rationale: Delivery mechanisms differ across languages and concurrency models. The spec prescribes what information is captured, not how it reaches the caller.
Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
tokens_over_budget | integer | Yes | — | Number of tokens exceeding the target budget. |
overflowing_items | list of ContextItem | Yes | — | All items present at the time of overflow (pinned + sliced combined). |
budget | ContextBudget | Yes | — | The budget that was exceeded. |
Example
{
"tokens_over_budget": 384,
"overflowing_items": [
{ "content": "...", "tokens": 512, "kind": "Message" },
{ "content": "...", "tokens": 1024, "kind": "ToolOutput" }
],
"budget": {
"maxTokens": 4096,
"targetTokens": 3072
}
}
Conformance Notes
- Stage-level
TraceEventrecords MUST haveduration_msset to the wall-clock duration of that stage. Item-level events MUST haveduration_msset to0.0. - Each named pipeline stage (
Classify,Score,Deduplicate,Slice,Place) MUST produce exactly one stage-levelTraceEventper pipeline run, emitted after the stage completes. This applies even when a stage processes zero items — the event is emitted withitem_countof0. - item-level events for a stage MUST precede the corresponding stage-level event. Item-level events are recorded as each item is processed; the stage-level event is emitted only after all items in the stage have been processed. Conformance vectors that test the
eventslist must be authored to reflect this ordering. OverflowEventMUST only be produced whenOverflowStrategyisProceed. It MUST NOT be produced forThrow,Truncate, or any other strategy.- Implementations MUST handle unknown
PipelineStagevalues gracefully when deserializing diagnostic data from other implementations.
Exclusion Reasons
Overview
Every item considered by the pipeline receives either an inclusion reason or an exclusion reason. These reasons form the explainability core of the diagnostics system — they answer “why was this item included or excluded?” Reasons are attached to items in the SelectionReport, where each IncludedItem carries an InclusionReason and each ExcludedItem carries an ExclusionReason.
ExclusionReason
ExclusionReason describes why the pipeline did not select an item for the context window. Each variant is data-carrying: it includes associated fields that provide context for the exclusion decision. This is a deliberate design choice — data-carrying variants enable callers to programmatically inspect exclusion details without parsing message strings.
Rejected alternative: fieldless enum variants — simpler wire format, but forces callers to reconstruct context from other diagnostic data, which is fragile and error-prone.
| Variant | Description | Fields | Emitted by | Status |
|---|---|---|---|---|
BudgetExceeded | Item did not fit within the token budget | item_tokens: integer, available_tokens: integer | Slice stage, Place stage (Truncate) | Active |
ScoredTooLow | Item scored below the selection threshold | score: float64, threshold: float64 | — | Reserved |
Deduplicated | Byte-exact content duplicate removed | deduplicated_against: string (content identifier) | Deduplicate stage | Active |
QuotaCapExceeded | Kind exceeded its quota cap | kind: string, cap: integer, actual: integer | — | Reserved |
QuotaRequireDisplaced | Displaced to satisfy another kind’s quota requirement | displaced_by_kind: string | — | Reserved |
NegativeTokens | Item has a negative token count | tokens: integer | Classify stage | Active |
PinnedOverride | Displaced by a pinned item during truncation overflow handling | displaced_by: string (content identifier) | Place stage (Truncate) | Active |
Filtered | Excluded by a user-defined filter predicate | filter_name: string | — | Reserved |
Reserved variants are defined but not emitted by any built-in pipeline stage. They are reserved for future specification versions and custom stage implementations. Implementations must include these variants in their type definitions to ensure forward compatibility.
Rationale: defining reserved variants now ensures that implementations allocate space in their type system for future extensions, avoiding breaking changes when these variants become active.
JSON example — BudgetExceeded:
{
"reason": "BudgetExceeded",
"item_tokens": 2048,
"available_tokens": 512
}
JSON example — Deduplicated:
{
"reason": "Deduplicated",
"deduplicated_against": "tool_output_abc123"
}
JSON example — NegativeTokens:
{
"reason": "NegativeTokens",
"tokens": -1
}
Reserved variant examples:
JSON example — ScoredTooLow:
{
"reason": "ScoredTooLow",
"score": 0.12,
"threshold": 0.25
}
JSON example — QuotaCapExceeded:
{
"reason": "QuotaCapExceeded",
"kind": "ToolOutput",
"cap": 3,
"actual": 4
}
JSON example — QuotaRequireDisplaced:
{
"reason": "QuotaRequireDisplaced",
"displaced_by_kind": "SystemPrompt"
}
JSON example — Filtered:
{
"reason": "Filtered",
"filter_name": "max_age_filter"
}
Wire format: the reason field is a string discriminator. Variant-specific fields appear as sibling fields alongside reason. Fields that belong to other variants are omitted — absent fields are never represented as nulls. Both ExclusionReason and InclusionReason use this same envelope format: { "reason": "<VariantName>", ...fields }. For InclusionReason, which carries no fields, the envelope contains only the reason discriminator.
InclusionReason
InclusionReason describes why the pipeline selected an item for the context window. Inclusion reasons are not data-carrying — they carry no additional fields beyond the variant name. The score itself (carried on IncludedItem) provides the quantitative detail.
Rationale: inclusion reasons are simple status indicators. Data-carrying variants would add complexity without diagnostic value.
| Variant | Description | Emitted by |
|---|---|---|
Scored | Included based on computed score within budget | Place stage |
Pinned | Bypassed scoring and slicing due to pinned status | Place stage |
ZeroToken | Included at no budget cost (zero-token item) | Place stage |
JSON example:
{
"reason": "Scored"
}
Conformance Notes
- Implementations MUST define all 8
ExclusionReasonvariants, including reserved variants, to ensure forward compatibility. - Implementations MUST handle unknown
ExclusionReasonvariants gracefully when deserializing diagnostic data from other implementations or future specification versions. - The
reasonfield in the JSON wire format MUST be the string name of the variant (e.g.,"BudgetExceeded", not an integer code). - Data-carrying fields MUST be present when the variant is emitted. For example,
BudgetExceededMUST include bothitem_tokensandavailable_tokens. - Reserved variants MUST NOT be emitted by built-in pipeline stages. Custom stage implementations may emit reserved variants.
SelectionReport
Overview
The SelectionReport is the complete diagnostic output from a pipeline run. It contains the event log, lists of included and excluded items with their reasons, and aggregate statistics. It is the primary artifact callers use to understand pipeline decisions — answering questions such as “why was this item excluded?” and “which items competed for the same budget?”
SelectionReport Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
events | list of TraceEvent | Yes | — | All recorded events in insertion (stage) order. See Events. |
included | list of IncludedItem | Yes | — | Items selected for the context window, in final placed order. |
excluded | list of ExcludedItem | Yes | — | Items not selected, sorted by score descending (stable by insertion order on ties). |
total_candidates | integer | Yes | — | Total number of items considered by the pipeline. Equals len(included) + len(excluded). |
total_tokens_considered | integer | Yes | — | Sum of tokens across all items in both included and excluded lists. |
Items excluded at any pipeline stage — including pre-scoring exclusions such as NegativeTokens at the Classify stage and Deduplicated at the Deduplicate stage — appear in the excluded list. The excluded list is the complete set of items not selected, regardless of which stage excluded them. Items excluded before scoring carry a score of 0.0.
How to Obtain
OBTAIN-REPORT:
collector <- CREATE DiagnosticTraceCollector(detail_level)
result <- pipeline.RUN(candidates, budget, collector)
report <- collector.BUILD-REPORT()
// result contains the selected items for use in the LLM call
// report contains the diagnostic data for inspection
The report is extracted from the collector after the pipeline run completes. This keeps the pipeline’s primary return type unchanged — the report is a side-channel, not part of the main output.
Rationale: post-call extraction separates diagnostic concerns from the pipeline’s functional return value. Callers who do not need diagnostics never interact with report types.
Rejected alternative: returning the report as part of the pipeline result — couples diagnostics to every call site, even those that do not need it.
Complete JSON example:
{
"events": [
{ "stage": "Classify", "duration_ms": 0.3, "item_count": 5 },
{ "stage": "Score", "duration_ms": 8.2, "item_count": 3 },
{ "stage": "Deduplicate", "duration_ms": 0.1, "item_count": 3 },
{ "stage": "Slice", "duration_ms": 1.4, "item_count": 3 },
{ "stage": "Place", "duration_ms": 0.2, "item_count": 2 }
],
"included": [
{
"item": { "content": "Recent conversation turn", "tokens": 256, "kind": "Message" },
"score": 0.92,
"reason": { "reason": "Scored" }
},
{
"item": { "content": "System prompt", "tokens": 128, "kind": "SystemPrompt" },
"score": 0.0,
"reason": { "reason": "Pinned" }
}
],
"excluded": [
{
"item": { "content": "Old tool output", "tokens": 2048, "kind": "ToolOutput" },
"score": 0.45,
"reason": { "reason": "BudgetExceeded", "item_tokens": 2048, "available_tokens": 512 }
},
{
"item": { "content": "Duplicate message", "tokens": 256, "kind": "Message" },
"score": 0.30,
"reason": { "reason": "Deduplicated", "deduplicated_against": "Recent conversation turn" }
}
],
"total_candidates": 5,
"total_tokens_considered": 4736
}
IncludedItem
An IncludedItem pairs a context item with the score and reason that led to its inclusion.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
item | ContextItem | Yes | — | The included context item. See ContextItem. |
score | float64 | Yes | — | The computed relevance score at time of inclusion. 0.0 for pinned and zero-token items. |
reason | InclusionReason | Yes | — | Why this item was included. See Exclusion Reasons. |
JSON example:
{
"item": { "content": "User's latest message", "tokens": 64, "kind": "Message" },
"score": 0.95,
"reason": { "reason": "Scored" }
}
ExcludedItem
An ExcludedItem pairs a context item with the score and reason that led to its exclusion.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
item | ContextItem | Yes | — | The excluded context item. See ContextItem. |
score | float64 | Yes | — | The computed relevance score at time of exclusion. |
reason | ExclusionReason | Yes | — | Why this item was excluded. See Exclusion Reasons. |
JSON example — BudgetExceeded:
{
"item": { "content": "Large tool output", "tokens": 4096, "kind": "ToolOutput" },
"score": 0.60,
"reason": { "reason": "BudgetExceeded", "item_tokens": 4096, "available_tokens": 1024 }
}
JSON example — Deduplicated:
{
"item": { "content": "Duplicate content", "tokens": 256, "kind": "Message" },
"score": 0.40,
"reason": { "reason": "Deduplicated", "deduplicated_against": "Original message" }
}
deduplicated_against appears only when reason is Deduplicated. For all other reasons, the variant’s own fields appear instead. Absent fields are omitted — no nulls.
Rationale: deduplicated_against is modelled as a field on the Deduplicated reason variant (not as a nullable top-level field on ExcludedItem) because the data-carrying variant design mandates that each variant carries its own context fields. Deduplicated carries deduplicated_against; BudgetExceeded carries item_tokens and available_tokens; and so on. This keeps ExcludedItem clean (always 3 fields: item, score, reason) and avoids nullable fields that apply to only one reason.
See Exclusion Reasons for the full rationale on data-carrying variants.
Conformance Notes
- The
excludedlist MUST be sorted by score descending. When two items have equal scores, the item excluded earlier in the pipeline run appears first (stable sort by insertion order on ties). This ordering surfaces the highest-value rejected items first, which is the most useful presentation for debugging “why wasn’t this included?” questions.
Rejected alternative: insertion order — less useful for diagnosis; the highest-scored excluded item is rarely the first item processed.
- The
includedlist MUST be in final placed order (the order determined by the Placer), not score order or insertion order. total_candidatesMUST equallen(included) + len(excluded).total_tokens_consideredMUST equal the sum oftokensacross all items in bothincludedandexcludedlists.- The
scorefield onIncludedItemandExcludedItemMUST reflect the score at the time the inclusion/exclusion decision was made, not a recalculated value. Items excluded before the Score stage (e.g.,NegativeTokensat Classify) MUST have ascoreof0.0.
OpenTelemetry Integration
The Wollax.Cupel.Diagnostics.OpenTelemetry companion package bridges the core pipeline’s ITraceCollector/TraceCollector abstraction to .NET ActivitySource, emitting structured telemetry that traces context-window construction.
Overview
Wollax.Cupel.Diagnostics.OpenTelemetry is a separate NuGet assembly. The core Wollax.Cupel assembly has zero dependency on OpenTelemetry (R032/D039). Callers who do not need telemetry incur no transitive dependency on the OpenTelemetry SDK.
The companion registers a single ActivitySource with the canonical name "Wollax.Cupel". To receive traces, callers add this name in their OpenTelemetry configuration:
services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddSource("Wollax.Cupel")
// ... exporters ...
);
Pre-Stability Disclaimer
Warning: All
cupel.*attribute names documented in this chapter are pre-stable. The OpenTelemetry LLM semantic conventions are under active development, andcupel.*names may be renamed, merged, or removed as the conventions stabilize.Do not build alert rules, dashboards, or automation that hard-code specific
cupel.*attribute names. Use the attribute names only for ad-hoc debugging and observability during development and staging.
Activity Hierarchy
A single pipeline run produces a tree of Activities:
cupel.pipeline ← root; spans the full pipeline run
├── cupel.stage.classify
├── cupel.stage.score
├── cupel.stage.deduplicate
├── cupel.stage.slice
└── cupel.stage.place
There are exactly five stage Activities, one per diagnostic stage. Stage names in cupel.stage.{name} attribute values use lowercase only, matching the PipelineStage enum value lowercasing.
Sort is omitted. Sort is an internal ordering step with no diagnostic boundary and no meaningful duration to report independently. It does not receive its own Activity, consistent with the PipelineStage enum omission documented in Events.
Verbosity Tiers
The CupelVerbosity enum controls how much diagnostic data is emitted per pipeline run. Choose the tier appropriate for the target environment.
StageOnly — Production-safe
Emits the root Activity and five stage Activities with item-count boundaries only. No per-item events are recorded.
Root Activity (cupel.pipeline) attributes:
| Attribute | Type | Description |
|---|---|---|
cupel.budget.max_tokens | int | The pipeline’s maximum token budget |
cupel.verbosity | string | The verbosity tier name ("StageOnly") |
Each stage Activity (cupel.stage.{name}) attributes:
| Attribute | Type | Description |
|---|---|---|
cupel.stage.name | string | Stage name in lowercase (e.g., "classify", "score") |
cupel.stage.item_count_in | int | Number of items entering this stage |
cupel.stage.item_count_out | int | Number of items leaving this stage |
Note: No
duration_msattribute is present. Stage duration is provided by the Activity’s own start and end timestamps; recording it as an attribute would duplicate information already available in the trace span.
StageAndExclusions — Staging environments
Emits all StageOnly attributes plus per-exclusion Events on each stage Activity where an exclusion occurred.
Additions over StageOnly:
On each stage Activity, for every item excluded during that stage, one cupel.exclusion Event is recorded:
| Event Attribute | Type | Description |
|---|---|---|
cupel.exclusion.reason | string | ExclusionReason variant name (e.g., "BudgetExceeded") |
cupel.exclusion.item_kind | string | The kind of the excluded item |
cupel.exclusion.item_tokens | int | Token count of the excluded item |
Additionally, a summary attribute is added to the stage Activity:
| Attribute | Type | Description |
|---|---|---|
cupel.exclusion.count | int | Total number of exclusions in this stage |
Note:
cupel.exclusion.reasonvalues are open-ended. NewExclusionReasonvariants may appear in future spec versions without a schema change. Trace backends and dashboards MUST NOT hard-code the set of expected reason values.
Full — Development only
Emits all StageAndExclusions attributes plus per-included-item Events recorded after the Place stage completes.
Additions over StageAndExclusions:
After the Place stage, for every item included in the final output, one cupel.item.included Event is recorded:
| Event Attribute | Type | Description |
|---|---|---|
cupel.item.kind | string | The kind of the included item |
cupel.item.tokens | int | Token count of the included item |
cupel.item.score | float64 | The item’s final composite score |
Note: No placement-position attribute is recorded. Position is a derived property of ordering and not a stable attribute of the item itself.
Cardinality
| Verbosity | Events / Pipeline Run | Recommended Environment |
|---|---|---|
StageOnly | ~10 (5 stage Activities + root) | Production |
StageAndExclusions | ~10 + 0–300 (depends on exclusion count) | Staging |
Full | ~10 + 0–1000 (depends on item count) | Development only |
Warning:
Fullverbosity can produce high-cardinality traces with large item sets. Do not enableFullin production without a sampling strategy that caps trace volume.
Attribute Reference
Complete flat table of all cupel.* attributes and events for quick lookup.
| Name | Type | Tier(s) | Carrier | Description |
|---|---|---|---|---|
cupel.budget.max_tokens | int | All | Root Activity | Pipeline maximum token budget |
cupel.verbosity | string | All | Root Activity | Verbosity tier name |
cupel.stage.name | string | All | Stage Activity | Stage name, lowercase |
cupel.stage.item_count_in | int | All | Stage Activity | Items entering the stage |
cupel.stage.item_count_out | int | All | Stage Activity | Items leaving the stage |
cupel.exclusion.count | int | StageAndExclusions, Full | Stage Activity | Total exclusions in the stage |
cupel.exclusion.reason | string | StageAndExclusions, Full | cupel.exclusion Event | ExclusionReason variant name |
cupel.exclusion.item_kind | string | StageAndExclusions, Full | cupel.exclusion Event | Kind of the excluded item |
cupel.exclusion.item_tokens | int | StageAndExclusions, Full | cupel.exclusion Event | Token count of the excluded item |
cupel.item.kind | string | Full | cupel.item.included Event | Kind of the included item |
cupel.item.tokens | int | Full | cupel.item.included Event | Token count of the included item |
cupel.item.score | float64 | Full | cupel.item.included Event | Final composite score of the included item |
Conformance Notes
cupel.exclusion.reasonmapping: Values MUST be the canonicalExclusionReasonvariant name string (e.g.,"BudgetExceeded","Deduplicated"). Implementations MUST NOT use numeric codes, display strings, or locale-specific representations. NewExclusionReasonvariants in future spec versions will appear as new attribute values — trace backends must not hard-code the set.- ActivitySource name: Implementations MUST register the ActivitySource with the name
"Wollax.Cupel"exactly. Callers configure their OpenTelemetry pipeline using this string; any deviation breaks trace collection silently. - Zero-dependency core: The
Wollax.Cupelassembly MUST have no compile-time or runtime dependency on any OpenTelemetry SDK assembly. The companion package provides the bridge; the core pipeline operates throughITraceCollectoronly. - Stage Activity names: Stage Activity names MUST follow the pattern
cupel.stage.{name}where{name}is the lowercase stage name. The five canonical values areclassify,score,deduplicate,slice,place.
Rust (cupel-otel)
The cupel-otel crate is the Rust equivalent of Wollax.Cupel.Diagnostics.OpenTelemetry. It implements the TraceCollector trait from the core cupel crate and bridges pipeline execution data to the opentelemetry API.
Source name: The Rust implementation registers with the source name "cupel" (distinct from the .NET source name "Wollax.Cupel" — D163). Callers must configure "cupel" in their OpenTelemetry SDK tracer provider:
#![allow(unused)] fn main() { // Register the tracer provider with the cupel source name let tracer_provider = SdkTracerProvider::builder() .with_batch_exporter(exporter) .build(); opentelemetry::global::set_tracer_provider(tracer_provider); }
Adding to a project
Add cupel-otel as a dependency in Cargo.toml:
[dependencies]
cupel-otel = "0.1"
Usage example
#![allow(unused)] fn main() { use cupel_otel::{CupelOtelTraceCollector, CupelVerbosity}; let collector = CupelOtelTraceCollector::new(CupelVerbosity::StageOnly); let result = pipeline.run_traced(&items, &collector).await; collector.on_pipeline_completed(&result.stage_traces); // Flush the tracer provider before process exit }
Implementation notes
- Explicit
.end()is mandatory. Theopentelemetry0.27Spantype does not auto-end on drop (D169). TheCupelOtelTraceCollectorcalls.end()explicitly on each span after recording its attributes and events. Callers who implement a customTraceCollectorusing theopentelemetryAPI must also call.end()explicitly. SpanDataimport path. The correct import isopentelemetry_sdk::export::trace::SpanData, notopentelemetry_sdk::trace::SpanData. The latter does not exist inopentelemetry_sdk0.27.- Unknown
ExclusionReasonvariants. Thecupel.exclusion.reasonattribute uses a_ => "Unknown"match arm for any unrecognisedExclusionReasonvariant. This provides forward compatibility with future spec variants — unknown reasons are emitted as"Unknown"rather than causing a compilation failure or panic.
Budget Simulation
Budget simulation methods are extension methods on CupelPipeline in the .NET implementation. They orchestrate internal DryRun calls to answer questions about what the pipeline would select at different token budgets — for example, which items are marginal at a given budget, or what is the minimum budget required to include a specific item.
Language Parity Note: The budget simulation API (GetMarginalItems, FindMinBudgetFor) is scoped to the .NET implementation in v1.3. Rust parity is deferred — the Rust crate does not expose these methods. The deferral rationale: the Rust Pipeline does not yet expose a public DryRun equivalent, and budget simulation was not a priority for Rust consumers in this release cycle. A future milestone will assess Rust parity when the Rust pipeline gains a dry_run surface.
SweepBudget Out-of-Scope Note: SweepBudget (exhaustive budget sweep) has been assigned to the Smelt project and will not be added to Cupel.
DryRun Determinism Invariant
DryRun MUST produce identical output for identical inputs. The slicer’s tie-breaking rule MUST be deterministic and stable across calls. Implementations that depend on non-deterministic ordering (e.g., hash-map iteration order) are non-conformant.
For GreedySlice, determinism is guaranteed by the deterministic tie-break contract: equal-density items are ordered by original index ascending. This means repeated DryRun calls with the same items and budget always produce the same included set in the same order — the foundation on which GetMarginalItems and FindMinBudgetFor rely to produce meaningful diff and binary-search results.
GetMarginalItems
Purpose
Identify which items are included in a full-budget run but excluded when the budget is reduced by slackTokens. These are the items that become available as the budget grows from (budget.MaxTokens - slackTokens) up to budget.MaxTokens.
Signature
IReadOnlyList<ContextItem> GetMarginalItems(
IReadOnlyList<ContextItem> items,
ContextBudget budget,
int slackTokens)
Budget Parameter
The budget parameter overrides the pipeline’s stored budget for both internal DryRun calls. This is required because DryRun uses the pipeline’s fixed budget by construction; the extension method supplies a temporary budget for its internal calls.
The reduced-budget run uses budget.MaxTokens - slackTokens as the max token count, with the same outputReserve and reservedSlots as the full budget. Formally:
reducedBudget = ContextBudget(
maxTokens: budget.MaxTokens - slackTokens,
targetTokens: budget.TargetTokens - slackTokens,
outputReserve: budget.OutputReserve)
Diff Direction
primary \ margin — items present in the full-budget result (primary) that are absent from the reduced-budget result (margin). Item identity is determined by object reference equality.
Monotonicity Assumption
GetMarginalItems assumes that a lower budget never causes a new item to appear that was not present at the higher budget (monotonic inclusion). This assumption holds for GreedySlice and KnapsackSlice but does not hold for QuotaSlice, where percentage-based allocations shift as the budget changes and can cause different kinds to appear at lower budgets.
QuotaSlice Guard
If the pipeline’s slicer is QuotaSlice, the method MUST throw InvalidOperationException with the message:
"GetMarginalItems requires monotonic item inclusion. QuotaSlice produces non-monotonic inclusion as budget changes shift percentage allocations."
Pseudocode
GET-MARGINAL-ITEMS(pipeline, items, budget, slackTokens):
reducedBudget <- ContextBudget(maxTokens: budget.maxTokens - slackTokens,
targetTokens: budget.targetTokens - slackTokens,
outputReserve: budget.outputReserve)
primary <- pipeline.DRY-RUN(items, budget)
margin <- pipeline.DRY-RUN(items, reducedBudget)
return [item in primary.included where item not in margin.included]
FindMinBudgetFor
Purpose
Find the minimum token budget (within a search ceiling) at which targetItem would be included in the selection result. Returns null if targetItem is not selectable within [targetItem.Tokens, searchCeiling].
Signature
int? FindMinBudgetFor(
IReadOnlyList<ContextItem> items,
ContextItem targetItem,
int searchCeiling)
Preconditions
Both conditions are checked before the search begins. If either is violated, the method MUST throw ArgumentException:
targetItemmust be an element ofitems.searchCeiling >= targetItem.Tokens.
Binary Search
The search range is [targetItem.Tokens, searchCeiling]. The lower bound is targetItem.Tokens as an optimization: the target item cannot be included in a budget smaller than its own token count.
The stop condition is high - low <= 1. At termination, high is the candidate minimum budget. The search performs approximately log₂(searchCeiling) DryRun invocations — typically 10–15 for realistic budget ceilings.
After the loop exits, the method performs one final DryRun at high to confirm inclusion. If targetItem is present in report.included, high is returned. Otherwise null is returned.
Return Value
int? — null means targetItem is not selectable within [targetItem.Tokens, searchCeiling] at any tested budget.
QuotaSlice + CountQuotaSlice Guard
If the pipeline’s slicer is QuotaSlice or CountQuotaSlice, the method MUST throw InvalidOperationException with the message:
"FindMinBudgetFor requires monotonic item inclusion. QuotaSlice and CountQuotaSlice produce non-monotonic inclusion as budget changes shift allocations. Use a GreedySlice or KnapsackSlice inner slicer for budget simulation."
The general precondition is: any slicer whose item inclusion is sensitive to absolute budget value in a non-monotonic way is incompatible with FindMinBudgetFor.
Pseudocode
FIND-MIN-BUDGET-FOR(pipeline, items, targetItem, searchCeiling):
low <- targetItem.tokens // inclusive lower bound
high <- searchCeiling // inclusive upper bound
if targetItem not in items:
throw ArgumentException("targetItem must be an element of items")
if searchCeiling < targetItem.tokens:
throw ArgumentException("searchCeiling must be >= targetItem.Tokens")
while high - low > 1:
mid <- low + (high - low) / 2
midBudget <- ContextBudget(maxTokens: mid, ...)
report <- pipeline.DRY-RUN(items, midBudget)
if targetItem in report.included:
high <- mid
else:
low <- mid
// Verify at high (the candidate minimum)
finalBudget <- ContextBudget(maxTokens: high, ...)
finalReport <- pipeline.DRY-RUN(items, finalBudget)
if targetItem in finalReport.included:
return high
return null
Conformance Notes
DryRunis the primitive; budget simulation builds entirely on top of it. Implementations MUST satisfy the DryRun Determinism Invariant before exposing budget simulation methods.- Custom slicers that implement non-monotonic inclusion MUST NOT be used with
FindMinBudgetFororGetMarginalItemswithout explicit documentation of the monotonicity property. - The
includedfield of theDryRunresult (see SelectionReport) is the source of truth for item identity comparisons in both methods.
Policy Sensitivity
Policy sensitivity runs multiple policy configurations over the same item set and computes a structured diff showing which items changed inclusion status across those configurations. It is the primary tool for comparing how different scorer, slicer, or placer choices affect selection — without constructing N full pipeline objects.
API
.NET
CupelPipeline.DryRunWithPolicy — run a single policy configuration without constructing a new pipeline:
ContextResult DryRunWithPolicy(
IReadOnlyList<ContextItem> items,
ContextBudget budget,
CupelPolicy policy)
Defined on CupelPipeline. Internally constructs a temporary pipeline using CreateBuilder().WithPolicy(policy).WithBudget(budget) and calls DryRunWithBudget. The returned ContextResult.Report is a SelectionReport covering all included and excluded items.
PolicySensitivityExtensions.PolicySensitivity (policy overload) — run multiple policy configurations and diff them:
// Policy-accepting overload
PolicySensitivityReport PolicySensitivity(
IReadOnlyList<ContextItem> items,
ContextBudget budget,
params (string Label, CupelPolicy Policy)[] variants)
Extension method on CupelPipeline in Wollax.Cupel.Diagnostics. The pipeline-based overload (accepting (string Label, CupelPipeline Pipeline)[]) remains available for callers needing full pipeline control.
Throws ArgumentException if fewer than 2 variants are supplied.
Rust
cupel::analytics::policy_sensitivity — run multiple Policy configurations:
#![allow(unused)] fn main() { pub fn policy_sensitivity( items: &[ContextItem], budget: &ContextBudget, variants: &[(impl AsRef<str>, &Policy)], ) -> Result<PolicySensitivityReport, CupelError> }
In cupel::analytics. Returns Err(CupelError::PipelineConfig(...)) if fewer than 2 variants are provided.
cupel::analytics::policy_sensitivity_from_pipelines — pipeline-based variant:
#![allow(unused)] fn main() { pub fn policy_sensitivity_from_pipelines( items: &[ContextItem], budget: &ContextBudget, variants: &[(impl AsRef<str>, &Pipeline)], ) -> Result<PolicySensitivityReport, CupelError> }
Use when Policy cannot express the required slicer (e.g., CountQuotaSlice — see Language Notes below).
Types
PolicySensitivityReport
#![allow(unused)] fn main() { // Rust pub struct PolicySensitivityReport { /// Per-variant selection results, in input order. pub variants: Vec<(String, SelectionReport)>, /// Items whose inclusion status differed across at least two variants. pub diffs: Vec<PolicySensitivityDiffEntry>, } }
// .NET
public sealed class PolicySensitivityReport
{
public IReadOnlyList<(string Label, SelectionReport Report)> Variants { get; init; }
public IReadOnlyList<PolicySensitivityDiffEntry> Diffs { get; init; }
}
PolicySensitivityDiffEntry
#![allow(unused)] fn main() { // Rust pub struct PolicySensitivityDiffEntry { /// The content string identifying the item. pub content: String, /// One entry per variant, in input order. pub statuses: Vec<(String, ItemStatus)>, } }
// .NET
public sealed class PolicySensitivityDiffEntry
{
public string Content { get; init; }
public IReadOnlyList<(string Label, ItemStatus Status)> Statuses { get; init; }
}
ItemStatus
#![allow(unused)] fn main() { // Rust #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum ItemStatus { Included, Excluded, } }
// .NET
public enum ItemStatus { Included, Excluded }
Diff Semantics
Content-keyed matching. Items are identified by their content string. Two items from different variant runs are considered the same item if their content fields are equal (ordinal comparison). ContextItem object identity is not used.
Swing-only filter. Only items where at least two variants disagree on inclusion appear in diffs. An item included by all variants, or excluded by all variants, does not appear in diffs.
variants preserves input order. PolicySensitivityReport.variants reflects the input variants slice/array in the same order it was supplied.
diffs order is unspecified. The order of entries in PolicySensitivityReport.diffs is implementation-defined. Callers must not rely on a particular ordering of diff entries.
Minimum Variants
Implementations MUST require at least 2 variants. Fewer variants MUST return an error:
- Rust:
Err(CupelError::PipelineConfig("policy_sensitivity requires at least 2 variants")) - .NET:
ArgumentExceptionwith message"At least two variants are required for a sensitivity comparison."
A single-variant call cannot produce a meaningful diff and is always a caller error.
Explicit Budget Parameter
Both DryRunWithPolicy and policy_sensitivity take an explicit budget parameter. There is no option to inherit a budget from the pipeline or policy.
Rationale: A CupelPolicy does not carry a budget. An implicit budget would silently apply the pipeline’s own stored budget to a different policy configuration — producing surprising results when the pipeline budget was set for a different context size. Making the budget explicit forces the caller to confirm the comparison baseline and eliminates the ambiguity.
Language Notes
CupelPolicy cannot express CountQuotaSlice. The SlicerType enum in the .NET CupelPolicy has no CountQuota variant. Callers requiring count-quota fork diagnostics MUST use the pipeline-based overload:
- .NET:
PolicySensitivity(items, budget, params (string Label, CupelPipeline Pipeline)[] variants) - Rust:
policy_sensitivity_from_pipelines(items, budget, variants: &[(impl AsRef<str>, &Pipeline)])
This is a known CupelPolicy coverage gap, not a limitation of the underlying diff engine.
Examples
.NET — 2-variant comparison
using Wollax.Cupel;
using Wollax.Cupel.Diagnostics;
var items = new List<ContextItem>
{
new ContextItemBuilder("item-a", 40).WithPriority(10).Build(),
new ContextItemBuilder("item-b", 40).WithPriority(1).Build(),
};
var budget = new ContextBudget(maxTokens: 50, targetTokens: 40);
var policyA = new CupelPolicy { Intent = "code-review" };
var policyB = new CupelPolicy { Intent = "rag" };
var pipeline = CupelPipeline.CreateBuilder().WithBudget(budget).Build();
var report = pipeline.PolicySensitivity(items, budget,
("code-review", policyA),
("rag", policyB));
Console.WriteLine($"Swinging items: {report.Diffs.Count}");
foreach (var diff in report.Diffs)
{
Console.WriteLine($" {diff.Content}:");
foreach (var (label, status) in diff.Statuses)
Console.WriteLine($" {label}: {status}");
}
Rust — 2-variant comparison
#![allow(unused)] fn main() { use cupel::{ ChronologicalPlacer, ContextBudget, ContextItemBuilder, GreedySlice, OverflowStrategy, Policy, PolicyBuilder, PriorityScorer, ReflexiveScorer, policy_sensitivity, }; use std::sync::Arc; use std::collections::HashMap; let items = vec![ ContextItemBuilder::new("item-a", 40) .priority(10) .future_relevance_hint(0.1) .build() .unwrap(), ContextItemBuilder::new("item-b", 40) .priority(1) .future_relevance_hint(0.9) .build() .unwrap(), ]; let budget = ContextBudget::new(50, 40, 0, HashMap::new(), 0.0).unwrap(); let policy_priority = PolicyBuilder::new() .scorer(Arc::new(PriorityScorer)) .slicer(Arc::new(GreedySlice)) .placer(Arc::new(ChronologicalPlacer)) .overflow_strategy(OverflowStrategy::Throw) .deduplication(true) .build() .unwrap(); let policy_reflexive = PolicyBuilder::new() .scorer(Arc::new(ReflexiveScorer)) .slicer(Arc::new(GreedySlice)) .placer(Arc::new(ChronologicalPlacer)) .overflow_strategy(OverflowStrategy::Throw) .deduplication(true) .build() .unwrap(); let variants: Vec<(&str, &Policy)> = vec![ ("priority", &policy_priority), ("reflexive", &policy_reflexive), ]; let report = policy_sensitivity(&items, &budget, &variants).unwrap(); println!("Swinging items: {}", report.diffs.len()); for diff in &report.diffs { println!(" {}:", diff.content); for (label, status) in &diff.statuses { println!(" {}: {:?}", label, status); } } }
Conformance Levels
The Cupel conformance suite defines two tiers of test vectors: Required and Optional. Together, they provide a comprehensive validation of an implementation’s correctness.
Required
An implementation MUST pass all Required test vectors to claim Cupel conformance. These vectors cover the core algorithm behavior that every conforming implementation must exhibit.
Covered Algorithms
| Category | Algorithms |
|---|---|
| Scorers | RecencyScorer, PriorityScorer, KindScorer, TagScorer, FrequencyScorer, ReflexiveScorer, CompositeScorer, ScaledScorer |
| Slicers | GreedySlice, KnapsackSlice, QuotaSlice |
| Placers | ChronologicalPlacer, UShapedPlacer |
| Pipeline | End-to-end scenarios combining scoring, slicing, and placing |
Required vectors test the fundamental contracts of each algorithm:
- Correct score computation for all 8 scorer types
- Correct item selection for GreedySlice, KnapsackSlice, and QuotaSlice
- Correct ordering for both placers
- End-to-end pipeline behavior with pinned items, multiple scorer/slicer/placer combinations
Optional
An implementation MAY pass Optional test vectors for full conformance. These vectors cover edge cases and advanced features. Passing all Optional vectors demonstrates robustness in boundary conditions.
Covered Scenarios
| Category | Scenarios |
|---|---|
| Scoring edge cases | Single-item recency, all-null timestamps, degenerate ScaledScorer, nested CompositeScorer |
| Slicing edge cases | Empty input |
| Pipeline edge cases | Empty input, all-pinned items, content deduplication, overflow with Truncate strategy |
Claiming Conformance
An implementation may claim one of two conformance levels:
- Cupel Conformant: All Required test vectors pass.
- Cupel Fully Conformant: All Required and all Optional test vectors pass.
Implementations SHOULD document which conformance level they have achieved and the version of the test vector suite used for validation.
Test Vector Format
All conformance test vectors are TOML files. Each file contains a single test case with inputs, configuration, and expected outputs. This chapter documents the schema for each stage type.
Common Structure
Every test vector contains a [test] table that identifies the test:
[test]
name = "Human-readable test name"
stage = "scoring" # one of: scoring, slicing, placing, pipeline
The stage field determines which additional tables are expected.
Scoring Vectors
Scoring vectors test individual scorer algorithms in isolation.
Schema
| Table | Field | Type | Description |
|---|---|---|---|
[test] | name | string | Test name |
[test] | stage | string | "scoring" |
[test] | scorer | string | Scorer type: "recency", "priority", "kind", "tag", "frequency", "reflexive", "composite", "scaled" |
[[items]] | content | string | Item content (used as identifier in assertions) |
[[items]] | tokens | integer | Token count |
[[items]] | timestamp | datetime (optional) | UTC timestamp |
[[items]] | priority | integer (optional) | Numeric priority value |
[[items]] | kind | string (optional) | ContextKind value |
[[items]] | tags | array of string (optional) | Item tags |
[[items]] | futureRelevanceHint | float (optional) | Caller-provided relevance hint |
[[expected]] | content | string | Item content to match |
[[expected]] | score_approx | float | Expected score (compared with epsilon tolerance) |
[tolerance] | score_epsilon | float | Maximum allowed absolute difference between actual and expected score (default: 1e-9) |
Scorer-Specific Configuration
Some scorers require additional configuration:
| Table | Field | Type | Applies To |
|---|---|---|---|
[config] | use_default_weights | boolean | KindScorer — when true, use default weight map |
[[config.weights]] | kind | string | KindScorer — custom weight kind |
[[config.weights]] | weight | float | KindScorer — custom weight value |
[[config.tag_weights]] | tag | string | TagScorer — configured tag name |
[[config.tag_weights]] | weight | float | TagScorer — configured tag weight |
[[config.scorers]] | type | string | CompositeScorer — child scorer type |
[[config.scorers]] | weight | float | CompositeScorer — child weight |
[config] | inner_scorer | string | ScaledScorer — inner scorer type |
Score Comparison
Score assertions use epsilon tolerance comparison:
abs(actual_score - expected_score) < score_epsilon
This addresses floating-point representation differences across languages and platforms. The default epsilon of 1e-9 is sufficient for all algorithms in this specification. Test vectors that require different tolerances specify a custom [tolerance] table.
Slicing Vectors
Slicing vectors test slicer algorithms with pre-scored input.
Schema
| Table | Field | Type | Description |
|---|---|---|---|
[test] | name | string | Test name |
[test] | stage | string | "slicing" |
[test] | slicer | string | Slicer type: "greedy", "knapsack", "quota" |
[budget] | target_tokens | integer | Token budget for selection |
[[scored_items]] | content | string | Item content |
[[scored_items]] | tokens | integer | Token count |
[[scored_items]] | score | float | Pre-computed relevance score |
[[scored_items]] | kind | string (optional) | ContextKind (used by QuotaSlice) |
[expected] | selected_contents | array of string | Content values of selected items (set comparison — order does not matter) |
Slicer-Specific Configuration
| Table | Field | Type | Applies To |
|---|---|---|---|
[config] | bucket_size | integer | KnapsackSlice — discretization bucket size (default: 100) |
[config] | inner_slicer | string | QuotaSlice — inner slicer type |
[[config.quotas]] | kind | string | QuotaSlice — ContextKind |
[[config.quotas]] | require | float | QuotaSlice — minimum percentage |
[[config.quotas]] | cap | float | QuotaSlice — maximum percentage |
Set Comparison
Slicer output is compared as a set — the order of items in selected_contents does not matter. An implementation passes if the set of selected item contents exactly matches the expected set. This is because slicers select items but do not determine presentation order (that is the placer’s responsibility). This applies to all slicers including QuotaSlice — ordering is always the placer’s responsibility, not the slicer’s.
Placing Vectors
Placing vectors test placer algorithms with pre-scored input.
Schema
| Table | Field | Type | Description |
|---|---|---|---|
[test] | name | string | Test name |
[test] | stage | string | "placing" |
[test] | placer | string | Placer type: "chronological", "u-shaped" |
[[items]] | content | string | Item content |
[[items]] | tokens | integer | Token count |
[[items]] | score | float | Pre-computed relevance score |
[[items]] | timestamp | datetime (optional) | UTC timestamp (used by ChronologicalPlacer) |
[expected] | ordered_contents | array of string | Content values in expected output order (ordered comparison) |
Ordered Comparison
Placer output is compared as an ordered list — the position of each item matters. An implementation passes if the output items, in order, match the expected ordered_contents exactly.
Pipeline Vectors
Pipeline vectors test the full 6-stage pipeline end-to-end.
Schema
| Table | Field | Type | Description |
|---|---|---|---|
[test] | name | string | Test name |
[test] | stage | string | "pipeline" |
[budget] | max_tokens | integer | Maximum token capacity |
[budget] | target_tokens | integer | Target token budget for selection |
[budget] | output_reserve | integer | Tokens reserved for output (default: 0) |
[config] | slicer | string | Slicer type |
[config] | placer | string | Placer type |
[config] | deduplication | boolean | Whether deduplication is enabled |
[config] | overflow_strategy | string (optional) | "throw", "truncate", or "proceed" (default: "throw") |
[[config.scorers]] | type | string | Scorer type |
[[config.scorers]] | weight | float | Scorer weight (used for CompositeScorer weighting) |
[[items]] | content | string | Item content |
[[items]] | tokens | integer | Token count |
[[items]] | kind | string (optional) | ContextKind |
[[items]] | timestamp | datetime (optional) | UTC timestamp |
[[items]] | priority | integer (optional) | Numeric priority |
[[items]] | tags | array of string (optional) | Item tags |
[[items]] | futureRelevanceHint | float (optional) | Relevance hint |
[[items]] | pinned | boolean (optional) | Whether item is pinned (default: false) |
[[expected_output]] | content | string | Content values in expected output order (ordered comparison) |
Ordered Comparison
Pipeline output is compared as an ordered list — both the selected items and their presentation order must match. An implementation passes if the output items, in order, match the expected_output entries exactly.
Diagnostics Vectors
Diagnostics vectors extend pipeline vectors with an [expected.diagnostics] sub-table. They assert on which items were included or excluded, why, and aggregate counts. Diagnostics vectors are pipeline-level only (stage = "pipeline"). The [expected.diagnostics] table composes with [[expected_output]] — a single vector file can assert on both output order and diagnostic details simultaneously.
Schema
| Table | Field | Type | Required | Description |
|---|---|---|---|---|
[[expected.diagnostics.included]] | content | string | yes | Item content (matches placed order) |
[[expected.diagnostics.included]] | score_approx | float | yes | Expected score (epsilon tolerance) |
[[expected.diagnostics.included]] | inclusion_reason | string | yes | Reason: "Scored", "Pinned", "ZeroToken" |
[[expected.diagnostics.excluded]] | content | string | yes | Item content (sorted by score desc) |
[[expected.diagnostics.excluded]] | score_approx | float | yes | Expected score (epsilon tolerance) |
[[expected.diagnostics.excluded]] | exclusion_reason | string | yes | Reason discriminator: "BudgetExceeded", "Deduplicated", "QuotaCapExceeded" |
[[expected.diagnostics.excluded]] | item_tokens | integer | conditional | Token count of excluded item (required for BudgetExceeded) |
[[expected.diagnostics.excluded]] | available_tokens | integer | conditional | Remaining budget at exclusion (required for BudgetExceeded) |
[[expected.diagnostics.excluded]] | deduplicated_against | string | conditional | Content of duplicate kept (required for Deduplicated) |
[expected.diagnostics.summary] | total_candidates | integer | no | Total items considered |
[expected.diagnostics.summary] | total_tokens_considered | integer | no | Sum of all candidate token counts |
Ordering
included entries appear in placed order, matching the order of [[expected_output]] entries. excluded entries appear sorted by score descending.
Optionality
All three sub-tables (included, excluded, summary) are independently optional. A vector can assert on any combination: included items only, excluded items only, summary counts only, or any mix.
Compatibility
[expected.diagnostics] is a dotted-key sub-table under [expected] and does not conflict with [[expected_output]] (different key names). This is valid TOML 1.0. A single pipeline vector file may contain both [[expected_output]] (ordered output assertion) and [expected.diagnostics] (diagnostic assertion).
Example
[test]
name = "Pipeline diagnostics: BudgetExceeded exclusion reason"
stage = "pipeline"
# Expected values below are final. A live integration test against run_traced
# will be added in Phase 29.
#
# Budget: target=200, max=1000, reserve=0.
#
# Items:
# "fits": tokens=150, kind=Message, timestamp=Jun
# "too-big": tokens=400, kind=Message, timestamp=Jan
#
# Score (RecencyScorer, 2 timestamped, denominator=1):
# "too-big" (Jan): rank 0 → 0.0
# "fits" (Jun): rank 1 → 1.0
#
# Slice (Greedy, target=200):
# Density sort: fits(1.0/150≈0.00667), too-big(0.0/400=0.0)
# fits: 150 ≤ 200 → selected (remaining=50)
# too-big: 400 > 50 → excluded (BudgetExceeded)
#
# Expected diagnostics:
# included: fits (score=1.0, reason=Scored)
# excluded: too-big (score=0.0, reason=BudgetExceeded, item_tokens=400, available=50)
# summary: total_candidates=2, total_tokens_considered=550
[budget]
max_tokens = 1000
target_tokens = 200
output_reserve = 0
[config]
slicer = "greedy"
placer = "chronological"
deduplication = false
[[config.scorers]]
type = "recency"
weight = 1.0
[[items]]
content = "fits"
tokens = 150
kind = "Message"
timestamp = 2024-06-01T00:00:00Z
[[items]]
content = "too-big"
tokens = 400
kind = "Message"
timestamp = 2024-01-01T00:00:00Z
[[expected_output]]
content = "fits"
[expected.diagnostics.summary]
total_candidates = 2
total_tokens_considered = 550
[[expected.diagnostics.included]]
content = "fits"
score_approx = 1.0
inclusion_reason = "Scored"
[[expected.diagnostics.excluded]]
content = "too-big"
score_approx = 0.0
exclusion_reason = "BudgetExceeded"
item_tokens = 400
available_tokens = 50
Field Types
| Type | TOML Representation | Notes |
|---|---|---|
| string | "text" | UTF-8 string |
| integer | 42 | Signed 64-bit integer |
| float | 0.5 | IEEE 754 double-precision |
| boolean | true / false | |
| datetime | 2024-06-15T12:00:00Z | RFC 3339, always UTC |
| array of string | ["a", "b"] |
Extensibility
Future versions of the conformance suite may add new fields to existing tables. Implementations SHOULD ignore unknown fields in test vector files rather than raising errors. This enables forward compatibility as the specification evolves.
Running the Suite
This chapter describes how to parse and execute conformance test vectors in any programming language.
General Approach
- Discover test vector files by scanning the
conformance/required/andconformance/optional/directories. - Parse each
.tomlfile into a structured representation. - Construct the appropriate inputs (items, budget, scorer/slicer/placer configuration) from the parsed data.
- Execute the algorithm under test.
- Compare the actual output against the expected output using the appropriate comparison mode (set or ordered).
Pseudocode: Conformance Runner
RUN-CONFORMANCE-SUITE(vectorDir):
passed <- 0
failed <- 0
errors <- empty list
files <- GLOB(vectorDir, "**/*.toml")
for each file in files:
vector <- PARSE-TOML(file)
testName <- vector.test.name
stage <- vector.test.stage
try:
if stage = "scoring":
RUN-SCORING-TEST(vector)
else if stage = "slicing":
RUN-SLICING-TEST(vector)
else if stage = "placing":
RUN-PLACING-TEST(vector)
else if stage = "pipeline":
RUN-PIPELINE-TEST(vector)
else:
APPEND(errors, "Unknown stage: " + stage)
continue
passed <- passed + 1
catch error:
failed <- failed + 1
APPEND(errors, testName + ": " + error.message)
REPORT(passed, failed, errors)
Running a Scoring Test
RUN-SCORING-TEST(vector):
items <- BUILD-ITEMS(vector.items)
scorer <- BUILD-SCORER(vector.test.scorer, vector.config)
epsilon <- vector.tolerance.score_epsilon or 1e-9
for i <- 0 to length(vector.expected) - 1:
expectedContent <- vector.expected[i].content
expectedScore <- vector.expected[i].score_approx
// Find the corresponding input item by content
item <- FIND-BY-CONTENT(items, expectedContent)
actualScore <- scorer.Score(item, items)
if abs(actualScore - expectedScore) >= epsilon:
ERROR("Score mismatch for '" + expectedContent + "': " +
"expected " + expectedScore + ", got " + actualScore +
" (epsilon=" + epsilon + ")")
Running a Slicing Test
RUN-SLICING-TEST(vector):
scoredItems <- BUILD-SCORED-ITEMS(vector.scored_items)
budget <- BUILD-BUDGET(vector.budget)
slicer <- BUILD-SLICER(vector.test.slicer, vector.config)
selected <- slicer.Slice(scoredItems, budget)
actualContents <- SET(item.content for item in selected)
expectedContents <- SET(vector.expected.selected_contents)
if actualContents != expectedContents:
ERROR("Selection mismatch: expected " + expectedContents +
", got " + actualContents)
Running a Placing Test
RUN-PLACING-TEST(vector):
items <- BUILD-SCORED-ITEMS-FOR-PLACER(vector.items)
placer <- BUILD-PLACER(vector.test.placer)
placed <- placer.Place(items)
actualContents <- [item.content for item in placed]
expectedContents <- vector.expected.ordered_contents
if actualContents != expectedContents:
ERROR("Placement order mismatch: expected " + expectedContents +
", got " + actualContents)
Running a Pipeline Test
RUN-PIPELINE-TEST(vector):
items <- BUILD-ITEMS(vector.items)
budget <- BUILD-BUDGET(vector.budget)
config <- BUILD-PIPELINE-CONFIG(vector.config)
output <- PIPELINE-RUN(items, budget, config)
actualContents <- [item.content for item in output]
expectedContents <- [e.content for e in vector.expected_output]
if actualContents != expectedContents:
ERROR("Pipeline output mismatch: expected " + expectedContents +
", got " + actualContents)
Score Comparison
Score comparisons MUST use epsilon tolerance, never exact floating-point equality:
SCORES-MATCH(actual, expected, epsilon):
return abs(actual - expected) < epsilon
The default epsilon is 1e-9. This tolerance accounts for:
- IEEE 754 representation differences across platforms
- Intermediate rounding differences in division operations
- Language-specific floating-point arithmetic behavior
Conformance test vectors are designed so that correct implementations produce scores well within this tolerance. If an implementation fails a score comparison by a small margin (e.g., 1e-10), it is likely correct. If the difference is large (e.g., > 0.01), the algorithm implementation is likely incorrect.
Set vs. Ordered Comparison
| Stage | Comparison Mode | Rationale |
|---|---|---|
| Scoring | Per-item epsilon | Scores are floating-point values |
| Slicing | Set (unordered) | Slicers select items but do not determine order |
| Placing | Ordered list | Placers determine the exact presentation order |
| Pipeline | Ordered list | The pipeline produces a fully ordered output |
TOML Libraries
The following TOML parsing libraries are recommended for common implementation languages:
| Language | Library | Notes |
|---|---|---|
| C# | Tomlyn | NuGet package |
| Rust | toml | Cargo crate (serde-based) |
| Go | github.com/BurntSushi/toml | Standard Go TOML library |
| Python | tomllib (stdlib) | Built-in since Python 3.11 |
| TypeScript | @iarna/toml or smol-toml | npm packages |
| Java | com.moandjiezana.toml:toml4j | Maven artifact |
| Swift | TOMLDecoder | Swift Package Manager |
| Kotlin | cc.ekblad.toml4k | Gradle/Maven |
Tips for Implementors
- Start with scoring vectors. They test individual algorithms in isolation and are the easiest to debug.
- Verify your TOML parser handles datetimes. Some TOML libraries parse RFC 3339 timestamps into local time; ensure you compare temporal instants (UTC).
- Use reference identity for self-exclusion. FrequencyScorer and ScaledScorer use reference identity (
is) to identify the current item inallItems. Ensure your implementation does not use structural equality for this check. - Slicer output order does not matter. Only the set of selected items is tested. Do not fail a slicer test because items are in a different order than the expected list.
- Pipeline vectors are the final gate. If all scoring, slicing, and placing vectors pass but a pipeline vector fails, the issue is likely in stage integration (classify, deduplicate, sort, merge, overflow handling).
Cupel.Testing Vocabulary
Overview
The Cupel.Testing vocabulary is a language-agnostic set of named assertion patterns over SelectionReport. Each pattern has a precise semantic specification: what it asserts, which fields of SelectionReport it reads, the exact comparison operator used, and the exact structure of the error message emitted on failure. The vocabulary is a contract document — it defines the observable surface that conforming implementations must provide, without prescribing internal structure.
What Cupel.Testing is
- A named vocabulary of 13 assertion patterns, each fully specified with no undefined qualifiers
- A chain-based API:
SelectionReport.Should()returns aSelectionReportAssertionChain; each assertion method returnsthisfor fluent chaining; failures throwSelectionReportAssertionExceptionwith a structured message - The prerequisite for the
Wollax.Cupel.TestingNuGet package implemented in M003 (requirement R021) - Language-agnostic: pattern semantics are defined at the field-access level; language bindings follow the field-access semantics of
SelectionReport,IncludedItem, andExcludedItemas specified in SelectionReport
What Cupel.Testing is not
- Not a FluentAssertions dependency (D041): the chain plumbing is in-house (~100 lines); no dependency on
FluentAssertions,ApprovalTests, or any third-party assertion framework - Not snapshot-based (D041): snapshot assertions are explicitly deferred.
SelectionReport.excludedis sorted score-descending with an insertion-order tiebreak that is not observable from the report itself. Snapshot outputs are therefore non-deterministic across test runs unless the caller controls the complete input set and all scores are distinct. This precondition is not currently guaranteed by the spec. - Not an implementation: this chapter specifies the vocabulary; the implementation lives in M003
- Not a runtime analytics module:
BudgetUtilization(budget)andKindDiversity()are extension methods onSelectionReportthat belong inWollax.Cupelcore (D045), not inWollax.Cupel.Testing. The testing vocabulary may call these methods internally but does not define them.
Pre-decisions
The following four design choices are locked and apply uniformly across all 13 patterns. They are stated here once to prevent inconsistency in per-pattern specs.
PD-1: Predicate type — IncludedItem / ExcludedItem, not raw ContextItem
All predicate-bearing methods accept IncludedItem (for Inclusion-group methods) or ExcludedItem (for Exclusion-group methods) as the predicate parameter type. The predicate receives the full pair — item, score, and reason — not just the raw ContextItem.
Rationale: predicates over IncludedItem/ExcludedItem are strictly more powerful than predicates over ContextItem; callers can filter on score or reason as well as item fields. Implementations MAY provide convenience overloads that accept a ContextItem-based predicate for callers who only need to inspect content or kind, but such overloads are not part of this vocabulary spec.
PD-2: BudgetUtilization denominator — budget.MaxTokens
BudgetUtilization is defined as sum(included[i].item.tokens) / budget.MaxTokens.
The denominator is budget.MaxTokens (the hard token ceiling), not budget.TargetTokens.
Rationale: MaxTokens is the single authoritative ceiling callers control at construction time and the ceiling that matters for context-window safety. TargetTokens is a Slice-stage-internal soft target — it is not a public capacity metric exposed by ContextBudget and is not appropriate as a utilization denominator (D045, DI-3). Using TargetTokens as denominator would produce utilization values > 1.0 for reports where the pipeline filled the full max-token budget, which is misleading.
PD-3: Floating-point threshold comparisons — exact operator, no epsilon
Floating-point threshold comparisons use the exact operator: >= for lower-bound thresholds (e.g. HaveBudgetUtilizationAbove), <= for upper-bound. No default epsilon is baked into any method (D064).
Rationale: a baked-in epsilon hides real threshold violations and couples the spec to a particular floating-point implementation. Test authoring is the caller’s responsibility: callers must choose threshold values that avoid floating-point boundary cases (e.g. 0.799 instead of 0.8 when the computed ratio may drift by one ULP). The comparison operator is stated explicitly in each pattern spec.
PD-4: Chain plumbing — SelectionReport.Should(), SelectionReportAssertionChain, SelectionReportAssertionException
- Entry point:
SelectionReport.Should()— a method (or extension method) onSelectionReportthat returns aSelectionReportAssertionChainwrapping the report - Chaining: each assertion method on
SelectionReportAssertionChainreturnsthisso calls can be chained fluently - Failure: on failure, each assertion method throws
SelectionReportAssertionException(notInvalidOperationException, notAssertionException) with a structured message containing: assertion name, what was expected, and what was actually found - No side effects on success: assertion methods that pass do not modify any state and return
thisunchanged
Chain Plumbing
SelectionReportAssertionChain is the type returned by SelectionReport.Should(). It wraps a SelectionReport and exposes the 13 assertion methods listed in the Vocabulary section below.
Type shape
SelectionReportAssertionChain:
report: SelectionReport // the wrapped report; read-only after construction
// Entry point
SelectionReport.Should() → SelectionReportAssertionChain
// Each assertion method returns `this` for chaining
// On failure: throws SelectionReportAssertionException
// Inclusion group (3 methods)
IncludeItemWithKind(kind: ContextKind) → SelectionReportAssertionChain
IncludeItemMatching(predicate: Func<IncludedItem, bool>) → SelectionReportAssertionChain
IncludeExactlyNItemsWithKind(kind: ContextKind, n: int) → SelectionReportAssertionChain
// Exclusion group (4 methods)
ExcludeItemWithReason(reason: ExclusionReason) → SelectionReportAssertionChain
ExcludeItemMatchingWithReason(predicate: Func<ContextItem, bool>, reason: ExclusionReason) → SelectionReportAssertionChain
ExcludeItemWithBudgetDetails(predicate: Func<ContextItem, bool>, expectedItemTokens: int, expectedAvailableTokens: int) → SelectionReportAssertionChain
HaveNoExclusionsForKind(kind: ContextKind) → SelectionReportAssertionChain
// Aggregate group (2 methods)
HaveAtLeastNExclusions(n: int) → SelectionReportAssertionChain
ExcludedItemsAreSortedByScoreDescending() → SelectionReportAssertionChain
// Budget group (1 method)
HaveBudgetUtilizationAbove(threshold: double, budget: ContextBudget) → SelectionReportAssertionChain
// Coverage group (1 method)
HaveKindCoverageCount(n: int) → SelectionReportAssertionChain
// Ordering group (2 methods)
PlaceItemAtEdge(predicate: Func<IncludedItem, bool>) → SelectionReportAssertionChain
PlaceTopNScoredAtEdges(n: int) → SelectionReportAssertionChain
Error message contract
SelectionReportAssertionException carries a structured message. Each pattern’s spec (written in T02 and T03) defines the exact message template. All messages must include:
- Assertion name — the method name as a label (e.g.
"IncludeItemWithKind") - What was expected — the caller’s assertion parameter(s) stated explicitly
- What was actually found — the actual field values from the report that caused the failure, with enough context (item counts, score values, kind lists) for the test author to diagnose without re-running under a debugger
No message should reference internal type names or implementation details (D032).
Implementation cost
The chain plumbing is approximately 100 lines of code. The vocabulary spec does not prescribe the internal structure — only the entry point, method signatures, return type, and failure mechanism.
Vocabulary
The Cupel.Testing vocabulary defines 13 named assertion patterns organized into five groups. Each pattern is fully specified in the sections below (T02 covers patterns 1–7; T03 covers patterns 8–13).
Pattern summary table
| # | Group | Method | One-line description |
|---|---|---|---|
| 1 | Inclusion | IncludeItemWithKind(ContextKind) | At least one included item has the given Kind |
| 2 | Inclusion | IncludeItemMatching(Func<IncludedItem, bool>) | At least one included item satisfies the predicate |
| 3 | Inclusion | IncludeExactlyNItemsWithKind(ContextKind, int n) | Exactly N included items have the given Kind (N=0 is valid) |
| 4 | Exclusion | ExcludeItemWithReason(ExclusionReason) | At least one excluded item carries the given ExclusionReason |
| 5 | Exclusion | ExcludeItemMatchingWithReason(Func<ContextItem, bool>, ExclusionReason) | At least one excluded item satisfies the predicate and carries the given reason |
| 6 | Exclusion | ExcludeItemWithBudgetDetails(Func<ContextItem, bool>, int, int) | An excluded item matching the predicate has BudgetExceeded with the exact token counts |
| 7 | Exclusion | HaveNoExclusionsForKind(ContextKind) | No excluded item has the given Kind |
| 8 | Aggregate | HaveAtLeastNExclusions(int n) | At least N items in the Excluded list |
| 9 | Aggregate | ExcludedItemsAreSortedByScoreDescending() | Excluded list is sorted score-descending (conformance assertion) |
| 10 | Budget | HaveBudgetUtilizationAbove(double threshold, ContextBudget) | sum(included tokens) / budget.MaxTokens >= threshold |
| 11 | Coverage | HaveKindCoverageCount(int n) | At least N distinct ContextKind values appear in the Included list |
| 12 | Ordering | PlaceItemAtEdge(Func<IncludedItem, bool>) | An included item matching the predicate is at position 0 or position count−1 |
| 13 | Ordering | PlaceTopNScoredAtEdges(int n) | The N highest-scored included items occupy the N outermost positions |
Per-pattern specifications are written in T02 (patterns 1–7) and T03 (patterns 8–13). Each specification includes: assertion semantics, predicate type, edge cases and tolerance, tie-breaking behavior, and error message format.
Inclusion Group
The Inclusion group contains 3 patterns that assert over SelectionReport.included. All predicates in this group operate over IncludedItem (per PD-1), giving callers access to item, score, and reason.
IncludeItemWithKind(ContextKind kind)
Assertion semantics: Asserts that Included.Any(i => i.Item.Kind == kind) — at least one item in the included list has Kind equal to kind.
Predicate type: No predicate parameter. The kind value is matched via direct field equality: i.item.kind == kind.
ContextKind.Any sentinel: ContextKind.Any is not a valid argument to this assertion. Pass a specific kind (e.g., Message, SystemPrompt, ToolOutput). Passing ContextKind.Any is implementation-defined behavior; a conforming implementation MAY throw ArgumentException or treat it as always-matching, but the vocabulary does not assign semantics to this case.
Edge cases:
Includedis empty → assertion fails withcount = 0.Includedcontains items but none matchkind→ assertion fails withcount = 0for the given kind.
Tie-breaking: Not applicable. This is a pure existence check with no ordering dependency.
Error message format:
IncludeItemWithKind({kind}) failed: Included contained 0 items with Kind={kind}. Included had {count} items with kinds: [{actualKinds}].
Where {actualKinds} is a comma-separated list of distinct Kind values present in included (e.g., Message, ToolOutput). If included is empty, {actualKinds} is an empty string and {count} is 0.
IncludeItemMatching(Func<IncludedItem, bool> predicate)
Assertion semantics: Asserts that Included.Any(predicate) — at least one item in the included list satisfies the caller-supplied predicate.
Predicate type: IncludedItem (per PD-1). The predicate receives the full IncludedItem pair (fields: item, score, reason), not a raw ContextItem. This allows callers to filter on score, inclusion reason, or item content/kind from a single predicate.
Convenience overloads: Implementations MAY provide additional overloads that accept a Func<ContextItem, bool> predicate for callers who only need to inspect item content or kind. Such overloads are implementation-defined and not part of this vocabulary spec.
Edge cases:
Includedis empty → assertion fails withcount = 0, no items to show.Includedis non-empty but the predicate returnsfalsefor all items → assertion fails and includes a summary of included items (up to 5) in the error message to assist diagnosis.
Tie-breaking: Not applicable. Existence check only.
Error message format:
IncludeItemMatching failed: no item in Included matched the predicate. Included had {count} items.
Implementations SHOULD append a summary of up to 5 IncludedItem entries from included (showing kind, score, and reason for each) to aid diagnosis. The exact format of this optional appendix is implementation-defined, but no additional required fields are added to the base message above.
IncludeExactlyNItemsWithKind(ContextKind kind, int n)
Assertion semantics: Asserts that Included.Count(i => i.Item.Kind == kind) == n — the included list contains exactly n items with Kind equal to kind.
Predicate type: No predicate parameter. Count is computed by direct field equality: i.item.kind == kind.
N=0 semantics: n = 0 is a valid, well-defined argument. It asserts that no included item has the given kind. This is distinct from HaveNoExclusionsForKind (pattern 7): IncludeExactlyNItemsWithKind(kind, 0) asserts the kind is absent from included; HaveNoExclusionsForKind(kind) asserts the kind is absent from excluded. An item of a given kind could be absent from both lists (never a candidate).
Edge cases:
Includedis empty andn = 0→ assertion passes.Includedis empty andn > 0→ assertion fails withactual = 0.Includedhas matching items but count does not equaln→ assertion fails showing both expected and actual counts.
Tie-breaking: Not applicable. Count comparison only.
Error message format:
IncludeExactlyNItemsWithKind({kind}, {n}) failed: expected {n} items with Kind={kind} in Included, but found {actual}. Included had {count} items total.
Where {actual} is the count of items in included with Kind == kind, and {count} is the total length of included.
Exclusion Group
The Exclusion group contains 4 patterns that assert over SelectionReport.excluded. Predicates in this group operate over ContextItem or ExcludedItem depending on the method — see each pattern spec for the exact predicate type. Reason matching uses variant discriminant comparison, not string equality.
ExcludeItemWithReason(ExclusionReason reason)
Assertion semantics: Asserts that Excluded.Any(e => e.Reason is <reason variant>) — at least one item in the excluded list carries an exclusion reason of the given variant.
Reason matching: Variant discriminant match. In Rust this is a pattern match on the enum variant (e.g., matches!(e.reason, ExclusionReason::BudgetExceeded { .. })). In .NET this is an equality comparison on the flat enum value (e.g., e.Reason == ExclusionReason.BudgetExceeded). This is not a string equality check on the reason name.
Reserved variants: All 8 ExclusionReason variants are valid arguments, including reserved variants (ScoredTooLow, QuotaCapExceeded, QuotaRequireDisplaced, Filtered). Reserved variants are valid arguments even if no built-in pipeline stage currently emits them — custom stage implementations may emit reserved variants, and test authors may assert against them.
Edge cases:
Excludedis empty → assertion fails withcount = 0.Excludedis non-empty but no item carries the given reason variant → assertion fails showing the actual reason variants present.
Tie-breaking: Not applicable. Existence check only.
Error message format:
ExcludeItemWithReason({reason}) failed: no excluded item had reason {reason}. Excluded had {count} items with reasons: [{reasonList}].
Where {reasonList} is a comma-separated list of distinct reason variant names present in excluded (e.g., BudgetExceeded, Deduplicated). If excluded is empty, {reasonList} is an empty string and {count} is 0.
ExcludeItemMatchingWithReason(Func<ContextItem, bool> predicate, ExclusionReason reason)
Assertion semantics: Asserts that Excluded.Any(e => predicate(e.Item) && e.Reason is <reason variant>) — at least one item in the excluded list satisfies both the content predicate and the reason discriminant match.
Predicate type: ContextItem (the predicate is over the raw item, not the full ExcludedItem). Callers use the predicate to filter by content or kind; the reason parameter provides the second filter. This split is deliberate: the predicate is the “which item?” filter, and the reason is the “why?” filter.
Reason matching: Variant discriminant match. Same rules as ExcludeItemWithReason (pattern 4).
Partial-match count: The error message distinguishes two failure modes: (a) the predicate matched no excluded items at all, and (b) the predicate matched one or more excluded items but none had the expected reason. This distinction is exposed via {predicateMatchCount} in the error message.
Edge cases:
Excludedis empty → assertion fails;predicateMatchCount = 0.- Predicate matches zero items in
excluded→ assertion fails;predicateMatchCount = 0. - Predicate matches one or more items but none have the given reason → assertion fails; error shows partial-match count and the actual reasons of the matched items.
Tie-breaking: Not applicable. Existence check only.
Error message format:
ExcludeItemMatchingWithReason(reason={reason}) failed: predicate matched {predicateMatchCount} excluded item(s) but none had reason {reason}. Matched items had reasons: [{actualReasons}].
Where {predicateMatchCount} is the count of excluded items for which predicate(e.item) returned true, and {actualReasons} is a comma-separated list of the distinct reason variant names of those matched items. If predicateMatchCount = 0, {actualReasons} is an empty string.
ExcludeItemWithBudgetDetails(Func<ContextItem, bool> predicate, int expectedItemTokens, int expectedAvailableTokens)
Assertion semantics: Asserts that there exists an excluded item e such that:
predicate(e.item)is truee.reasonisBudgetExceedede.reason.item_tokens == expectedItemTokens(exact integer equality)e.reason.available_tokens == expectedAvailableTokens(exact integer equality)
Predicate type: ContextItem. The predicate identifies which item is being asserted about; the reason and token field checks are fixed by the assertion signature.
Token equality: Exact integer equality for both item_tokens and available_tokens. No tolerance. available_tokens is defined as effective_target − sum(sliced_item.tokens) at the moment of exclusion (D025).
Reserved variant note: This assertion is specific to the BudgetExceeded variant. It does not apply to other reason variants.
Edge cases:
- No excluded item matches the predicate → assertion fails; report which predicate-matching items exist (if any) with their actual
item_tokensandavailable_tokens. - Predicate matches an item but its reason is not
BudgetExceeded→ assertion fails; error shows actual reason. - Predicate matches a
BudgetExceededitem but token values differ → assertion fails; error shows both expected and actual token values.
Tie-breaking: Not applicable. The first predicate-matching BudgetExceeded item is used for error reporting if multiple candidates exist.
Error message format:
ExcludeItemWithBudgetDetails failed: expected BudgetExceeded with item_tokens={eIT}, available_tokens={eAT}, but found item_tokens={aIT}, available_tokens={aAT}.
Where {eIT} = expectedItemTokens, {eAT} = expectedAvailableTokens, {aIT} = actual item_tokens on the matched item, {aAT} = actual available_tokens on the matched item. If no predicate-matching BudgetExceeded item exists, replace the but found clause with: but no matching item had reason BudgetExceeded (implementation-defined extended form permitted).
Language note: In .NET,
ExclusionReasonis a flat enum with no associated data. Theitem_tokensandavailable_tokensfields are not available onExcludedItem.Reason— the .NETExclusionReasonenum carries only the variant discriminant (BudgetExceeded,Deduplicated, etc.) and no per-variant fields. .NET implementations ofWollax.Cupel.TestingMAY omit this assertion entirely or surface it differently — for example, asHaveExcludedItemWithBudgetExceeded(Func<ContextItem, bool> predicate)without theexpectedItemTokensandexpectedAvailableTokensparameters. If a .NET implementation omits the two token-detail parameters, the assertion degenerates toExcludeItemMatchingWithReason(predicate, ExclusionReason.BudgetExceeded)(pattern 5). This difference is a consequence of the language-level representation choice and not a spec deviation.
HaveNoExclusionsForKind(ContextKind kind)
Assertion semantics: Asserts that Excluded.All(e => e.Item.Kind != kind) — no item in the excluded list has Kind equal to kind.
Predicate type: No predicate parameter. The kind value is matched via direct field equality: e.item.kind == kind.
Semantic distinction from IncludeExactlyNItemsWithKind(kind, 0): These two patterns test different lists:
HaveNoExclusionsForKind(kind)asserts the kind is absent fromexcluded.IncludeExactlyNItemsWithKind(kind, 0)asserts the kind is absent fromincluded. An item of a given kind can be absent from both lists (it was never a candidate in the pipeline run). The two assertions are not equivalent and are not interchangeable.
Edge cases:
Excludedis empty → assertion passes (vacuous truth:Allover an empty set is true).Excludedcontains items of the given kind → assertion fails; error shows count and details of the first matching item.
Tie-breaking: Not applicable. The All predicate is evaluated over the full excluded list; the first matching item is used in the error message for diagnosis.
Error message format:
HaveNoExclusionsForKind({kind}) failed: found {count} excluded item(s) with Kind={kind}. First: score={score}, reason={reason}.
Where {count} is the total count of excluded items with Kind == kind, and {score} and {reason} are the score and reason variant name of the first such item in the excluded list (i.e., the highest-scored excluded item of that kind, since excluded is sorted score-descending).
Aggregate Counts Group
The Aggregate Counts group contains 2 patterns that assert over aggregate properties of SelectionReport.excluded — total count and ordering invariants.
HaveAtLeastNExclusions(int n)
Assertion semantics: Asserts that Excluded.Count >= n — the excluded list contains at least n items.
Predicate type: No predicate parameter. The comparison is over the total count of the excluded list.
N=0 semantics: n = 0 is a valid, well-defined argument. HaveAtLeastNExclusions(0) always passes unless the pipeline threw an exception (i.e., unless SelectionReport was never produced). There is no separate HaveNoExclusionsRequired() pattern — use HaveAtLeastNExclusions(0) instead. This is intentionally a no-op assertion useful for establishing “the pipeline ran without error” in minimal smoke tests.
Edge cases:
Excludedis empty andn = 0→ assertion passes.Excludedis empty andn > 0→ assertion fails withactual = 0.Excluded.Count >= n→ assertion passes.
Tie-breaking: Not applicable. Count comparison only.
Error message format:
HaveAtLeastNExclusions({n}) failed: expected at least {n} excluded items, but Excluded had {actual}.
Where {actual} is Excluded.Count.
ExcludedItemsAreSortedByScoreDescending()
Assertion semantics: Asserts that for all 0 <= i < Excluded.Count - 1: Excluded[i].Score >= Excluded[i+1].Score. The excluded list must be sorted in non-increasing score order.
Conformance assertion: This assertion tests a behavioral invariant of a correct Cupel pipeline implementation. A pipeline that satisfies the specification (see SelectionReport, D019) will always produce an
excludedlist in score-descending order. This assertion exists to detect non-conforming implementations and to validate hand-constructed test fixtures.
Score comparison: >= (non-increasing). Equal scores are permitted between adjacent items.
Tiebreak caveat: The specification (D019) guarantees that among items with equal scores, insertion order is preserved (the item added earlier appears earlier in excluded). However, this insertion-order tiebreak is not assertable from the report alone — the report does not expose an insertion index or candidate sequence number. This assertion therefore checks score-descending order only. Callers who need to verify tiebreak behavior must control the complete input set and ensure all scores are distinct.
Edge cases:
Excludedis empty → assertion passes (vacuous truth: no adjacent pair to check).Excludedhas exactly one item → assertion passes.- Any adjacent pair violates non-increasing order → assertion fails at the first violating index.
Error message format:
ExcludedItemsAreSortedByScoreDescending failed: item at index {i+1} (score={si1}) is higher than item at index {i} (score={si}). Expected non-increasing scores.
Where {i} is the 0-based index of the first violating position (the item whose successor is higher-scored), {si} is Excluded[i].Score, and {si1} is Excluded[i+1].Score.
Budget Group
The Budget group contains 1 pattern that asserts the token utilization of the included set relative to the caller-supplied budget.
HaveBudgetUtilizationAbove(double threshold, ContextBudget budget)
Assertion semantics: Asserts that sum(Included[i].Item.Tokens) / budget.MaxTokens >= threshold.
Denominator: budget.MaxTokens — the hard token ceiling set at ContextBudget construction time. This is the single authoritative capacity ceiling for context-window safety.
budget.TargetTokens is explicitly NOT the denominator. TargetTokens is a Slice-stage-internal soft target; it is not a public capacity metric exposed by ContextBudget. Using TargetTokens as denominator would produce utilization values > 1.0 for reports where the pipeline filled the full max-token budget, which is misleading. (PD-2, D045, DI-3.)
Comparison operator: Exact >= with no epsilon (PD-3, D064). Floating-point edge cases are test authoring responsibility — callers must choose threshold values that avoid boundary drift.
budget.MaxTokens == 0: This is a pipeline error. ContextBudget validates MaxTokens > 0 at construction time; a ContextBudget with MaxTokens == 0 cannot be produced by a conforming implementation. This assertion does not need to handle division-by-zero.
Edge cases:
Includedis empty →sum(Included[i].Item.Tokens) = 0; utilization =0.0. Assertion fails unlessthreshold <= 0.0.threshold = 0.0→ assertion passes wheneverIncludedis non-empty (utilization > 0.0) and also whenIncludedis empty (0.0 >= 0.0 is true). Valid usage.threshold > 1.0→ valid argument; the assertion will fail unless the sum of included tokens exceedsbudget.MaxTokens(which is an over-budget condition indicating a pipeline defect).
Tie-breaking: Not applicable. Arithmetic comparison only.
Error message format:
HaveBudgetUtilizationAbove({threshold}) failed: computed utilization was {actual:.6f} (includedTokens={includedTokens}, budget.MaxTokens={maxTokens}).
Where {actual} is the computed utilization value formatted to 6 decimal places, {includedTokens} is sum(Included[i].Item.Tokens), and {maxTokens} is budget.MaxTokens.
Kind Coverage Group
The Kind Coverage group contains 1 pattern that asserts over the diversity of ContextKind values in the included set.
HaveKindCoverageCount(int n)
Assertion semantics: Asserts that Included.Select(i => i.Item.Kind).Distinct().Count() >= n — the included list contains at least n distinct ContextKind values.
Predicate type: No predicate parameter. The kind set is computed as { i.item.kind | i ∈ included }.
No ordering dependency: This assertion is purely over the set of distinct kinds in included; it does not depend on list order or score values.
N=0 semantics: n = 0 is a valid argument; the assertion always passes (an empty set has 0 distinct kinds, which satisfies >= 0). There is no distinct HaveAtLeastOneKind() pattern — use HaveKindCoverageCount(1).
Edge cases:
Includedis empty → distinct kind count = 0. Assertion fails unlessn = 0.- All included items have the same kind → distinct count = 1.
Tie-breaking: Not applicable. Set cardinality comparison only.
Error message format:
HaveKindCoverageCount({n}) failed: expected at least {n} distinct ContextKind values in Included, but found {actual}: [{actualKinds}].
Where {actual} is the count of distinct kind values, and {actualKinds} is a comma-separated list of those distinct values (e.g., Message, SystemPrompt). If Included is empty, {actual} is 0 and {actualKinds} is an empty string.
Conformance Assertions Group
Conformance assertions verify behavioral invariants that a correct Cupel pipeline implementation must always satisfy. They are distinct from functional assertions (which verify caller intent) — a conformance assertion failing indicates either a non-conforming pipeline or a hand-constructed test fixture with invalid ordering.
The ExcludedItemsAreSortedByScoreDescending() pattern (pattern 9 above) is the primary conformance assertion in this vocabulary. It is documented in the Aggregate Counts Group because its method name groups naturally with the other aggregate exclusion-count assertions.
Ordering Group
The Ordering group contains 2 patterns that assert over the positions of items in SelectionReport.included. Both patterns carry a Placer dependency caveat — the meaning of positions in included is determined by the Placer used to produce the report.
Placer dependency caveat: These assertions are only meaningful when the caller knows the Placer’s ordering contract. For
UShapedPlacer, position 0 holds the highest-scored item and positioncount−1holds the second-highest. For other Placers, consult the Placer spec. Do not use these assertions against output from an unknown or unspecified Placer.
PlaceItemAtEdge(Func<IncludedItem, bool> predicate)
Assertion semantics: Asserts that predicate(Included[0]) || predicate(Included[Included.Count - 1]) — at least one of the two edge positions (position 0 and position count−1) contains an item that satisfies the predicate. “Edge” means exactly position 0 (first) or position Included.Count − 1 (last). No other positions are “edge” positions.
Predicate type: IncludedItem (per PD-1). The predicate receives the full IncludedItem pair (item, score, reason).
“Edge” definition: Position 0 (first) OR position Included.Count − 1 (last). Nothing more. In particular: if Included has only 1 item, position 0 and position count−1 are the same item; a predicate matching that single item satisfies the assertion.
Tie-score clarification: If multiple items share the same score as an edge item, the assertion passes only if the item satisfying the predicate physically occupies position 0 or position count−1. The assertion does not pass merely because the item has the same score as an edge item while being at a non-edge position. Position is the criterion, not score.
Empty Included: Included being empty is a degenerate case. An empty list has no edge positions; the assertion fails because no item can satisfy the predicate. The “item not found” error message variant applies.
Edge cases:
Includedis empty → assertion fails; “no item in Included matched the predicate” error variant.- Predicate matches no item in
Included→ assertion fails; “no item in Included matched the predicate” error variant. - Predicate matches one or more items, but none are at position 0 or
count−1→ assertion fails; “item matching predicate was at index {actual}” error variant, reporting the index of the first matching item.
Placer dependency caveat: This assertion is only meaningful when the caller knows the Placer’s ordering contract. For
UShapedPlacer, position 0 holds the highest-scored item and positioncount−1holds the second-highest. For other Placers, consult the Placer spec. Do not use this assertion against output from an unknown or unspecified Placer.
Error message format (item found at wrong index):
PlaceItemAtEdge failed: item matching predicate was at index {actual} (not at edge). Edge positions: 0 and {last}. Included had {count} items.
Where {actual} is the 0-based index of the first item in Included satisfying the predicate, {last} is Included.Count − 1, and {count} is Included.Count.
Error message format (item not in Included):
PlaceItemAtEdge failed: no item in Included matched the predicate.
PlaceTopNScoredAtEdges(int n)
Assertion semantics: Asserts that the n items with the highest score values in Included occupy the n outermost positions (edges) of the list. The edge position mapping for n items is: position 0, position count−1, position 1, position count−2, … (alternating from the two ends inward).
Step-by-step evaluation:
- Sort
Includedbyscoredescending to identify the top-nitems. - Enumerate the first
nedge positions:0, count−1, 1, count−2, … - Verify that each of the top-
nitems (by score) occupies one of those edge positions.
Tie-score handling: If multiple items share the score of the N-th top item (i.e., there is a tie at the boundary between the top-N set and the remainder), then any item with that tied score is a valid occupant of the N-th edge position. The assertion does not require a specific tied item to be at that position; it only requires that the N-th edge position is occupied by an item with a score ≥ the N-th top score.
Predicate type: No predicate parameter. Score comparison is over IncludedItem.Score values.
n = 0: Valid argument; assertion always passes (zero items to check, zero edge positions to verify).
n > Included.Count: Assertion fails — there are not enough items in Included to fill n edge positions. The error message should note the mismatch between n and Included.Count.
Edge cases:
Includedis empty andn = 0→ assertion passes.Includedis empty andn > 0→ assertion fails; no items to place at edges.n = Included.Count→ all items must be at “edge” positions, meaning the assertion checks that the full included list is sorted by edge-in order, not necessarily that it is sorted by score directly.
Placer dependency caveat: This assertion is only meaningful when the caller knows the Placer’s ordering contract. For
UShapedPlacer, position 0 holds the highest-scored item and positioncount−1holds the second-highest, with subsequent positions filling inward. For other Placers, consult the Placer spec. Do not use this assertion against output from an unknown or unspecified Placer.
Error message format:
PlaceTopNScoredAtEdges({n}) failed: {failCount} of the top-{n} scored items were not at expected edge positions. Top-{n} items (by score): [{topItems}]. Expected edge positions: [{edgePositions}].
Where {failCount} is the count of top-n items not occupying an expected edge position, {topItems} is a list of the top-n items formatted as (kind={kind}, score={score}, idx={actualIndex}), and {edgePositions} is the ordered list of expected edge position indices (e.g., 0, 5, 1, 4 for n=4 with count=6).
Notes
D041: Snapshot assertions deferred
Snapshot-based assertions over SelectionReport are explicitly deferred. The excluded list is sorted score-descending with an insertion-order tiebreak (D019): among items with equal scores, the item added earlier to the candidate set appears earlier in excluded. This tiebreak is an implementation detail — it is not observable from the SelectionReport alone because the report does not expose a candidate insertion index or sequence number.
Snapshot outputs are therefore non-deterministic across test runs unless the test author controls the complete input set AND all scores are distinct. This precondition is not currently guaranteed by the spec. Until SelectionReport ordering stability for ties is formally guaranteed and surfaced, snapshot assertions are not part of the Cupel.Testing vocabulary.
This deferral applies to any assertion of the form “the report serializes to exactly this JSON/string”. Per-field assertions (the 13 named patterns) are unaffected.
TotalTokensConsidered is not a utilization metric
TotalTokensConsidered (if surfaced as a SelectionReport field) is a candidate-set volume metric:
TotalTokensConsidered = sum(included[i].item.tokens) + sum(excluded[i].item.tokens)
It measures the total token volume of all items that entered the pipeline’s selection decision — it is not a utilization metric. Using TotalTokensConsidered as a denominator for utilization assertions would produce values far below 1.0 in typical runs (where many candidates are excluded) and would misrepresent budget efficiency.
For utilization assertions, use HaveBudgetUtilizationAbove(threshold, budget) (pattern 10), which divides by budget.MaxTokens — the authoritative capacity ceiling (PD-2, D045).
SelectionReportAssertionException should be a dedicated type
SelectionReportAssertionException must be a dedicated exception type — not InvalidOperationException, not AssertionException from any third-party framework, and not a raw Exception. (PD-4.)
Rationale: test frameworks (xUnit, NUnit, MSTest) recognize well-known assertion exception types and format them distinctly in test output. Using a dedicated type allows test runners to display Cupel assertion failures with the structured message they carry, rather than reporting them as unexpected system exceptions. It also allows callers to catch (SelectionReportAssertionException) in test utilities without accidentally catching unrelated InvalidOperationException instances.
The type must carry at minimum: the structured failure message (as .Message). Implementations MAY add structured properties (e.g., AssertionName, Expected, Actual) for programmatic inspection, but these are implementation-defined and not part of the vocabulary spec.
Changelog
All notable changes to the Cupel Specification are documented in this file.
The format is based on Keep a Changelog.
[1.3.0] — 2026-03-23
Added
- Scorers: DecayScorer with Exponential, Step, and Window curve factories; MetadataTrustScorer for caller-provided
cupel:trustmetadata passthrough - Slicers: CountQuotaSlice decorator slicer enforcing absolute item-count require/cap per ContextKind, with ScarcityBehavior (Degrade/Throw) and CountRequirementShortfall reporting
- Analytics: Budget simulation extension methods on CupelPipeline (.NET only):
GetMarginalItems(items, budget, slackTokens)andFindMinBudgetFor(items, targetItem, searchCeiling)with reference-equality diff semantics and binary search - Analytics:
BudgetUtilization(budget),KindDiversity(), andTimestampCoverage()extension methods on SelectionReport (both .NET and Rust) - Testing:
Wollax.Cupel.TestingNuGet package withSelectionReport.Should()assertion chains (13 assertion patterns from the testing vocabulary spec) - Diagnostics:
Wollax.Cupel.Diagnostics.OpenTelemetrycompanion package bridging Cupel pipelines to OpenTelemetry ActivitySource with three verbosity tiers (StageOnly, StageAndExclusions, Full)
Changed
- GreedySlice: Deterministic tie-break contract explicitly specified — equal-density items are ordered by original index ascending. This was implicit behavior in 1.0.0 but is now a spec-committed contract that budget simulation depends on.
Specification Decisions
- DecayScorer requires caller-injected TimeProvider with no default — makes time dependency visible and enables deterministic testing (D042)
- DecayScorer is not Clone in Rust; stores
Box<dyn TimeProvider + Send + Sync>(D047) - MetadataTrustScorer in .NET accepts both
stringanddoubleforcupel:trustmetadata value (D059) - CountQuotaSlice rejects KnapsackSlice as inner slicer at construction time; defers CountConstrainedKnapsackSlice to a future release
- Budget simulation API scoped to .NET in v1.3; Rust parity deferred to a future milestone
SweepBudget(exhaustive budget sweep) assigned to Smelt project, not Cupel
[1.0.0] — 2026-03-14
Initial specification release.
Added
- Data Model: ContextItem, ContextBudget, and all enumerations (ContextKind, OverflowStrategy, ExclusionReason)
- Pipeline: 6-stage fixed pipeline (Classify, Score, Deduplicate, Sort, Slice, Place)
- Scorers: RecencyScorer, PriorityScorer, KindScorer, TagScorer, FrequencyScorer, ReflexiveScorer, CompositeScorer, ScaledScorer
- Slicers: GreedySlice, KnapsackSlice, QuotaSlice
- Placers: ChronologicalPlacer, UShapedPlacer
- Conformance suite: 37 TOML test vectors (28 required, 9 optional) covering all algorithms and edge cases
- 13 required scoring vectors (all 8 scorer types)
- 6 required slicing vectors (GreedySlice, KnapsackSlice, QuotaSlice)
- 4 required placing vectors (ChronologicalPlacer, UShapedPlacer)
- 5 required pipeline vectors (multiple slicer/placer combinations, pinned items)
- 4 optional scoring vectors (edge cases)
- 1 optional slicing vector (empty input)
- 4 optional pipeline vectors (empty input, all-pinned, deduplication, overflow)
Specification Decisions
- Epsilon tolerance (
1e-9) for all floating-point score comparisons (P2) - Byte-exact ordinal comparison for content deduplication (P3)
- Case-insensitive ASCII case folding for ContextKind comparison (P4)
- KnapsackSlice discretization parameters are implementation-defined; conformance tests use score differences >= 0.01 (P5)
- QuotaSlice budget distribution uses floor truncation for all percentage-to-token conversions (P6)
- Stable sort mandated for all sorting operations with index-based tiebreaking (P1)