Test Vector Format

All conformance test vectors are TOML files. Each file contains a single test case with inputs, configuration, and expected outputs. This chapter documents the schema for each stage type.

Common Structure

Every test vector contains a [test] table that identifies the test:

[test]
name = "Human-readable test name"
stage = "scoring"    # one of: scoring, slicing, placing, pipeline

The stage field determines which additional tables are expected.

Scoring Vectors

Scoring vectors test individual scorer algorithms in isolation.

Schema

Table	Field	Type	Description
`[test]`	`name`	string	Test name
`[test]`	`stage`	string	`"scoring"`
`[test]`	`scorer`	string	Scorer type: `"recency"`, `"priority"`, `"kind"`, `"tag"`, `"frequency"`, `"reflexive"`, `"composite"`, `"scaled"`
`[[items]]`	`content`	string	Item content (used as identifier in assertions)
`[[items]]`	`tokens`	integer	Token count
`[[items]]`	`timestamp`	datetime (optional)	UTC timestamp
`[[items]]`	`priority`	integer (optional)	Numeric priority value
`[[items]]`	`kind`	string (optional)	ContextKind value
`[[items]]`	`tags`	array of string (optional)	Item tags
`[[items]]`	`futureRelevanceHint`	float (optional)	Caller-provided relevance hint
`[[expected]]`	`content`	string	Item content to match
`[[expected]]`	`score_approx`	float	Expected score (compared with epsilon tolerance)
`[tolerance]`	`score_epsilon`	float	Maximum allowed absolute difference between actual and expected score (default: `1e-9`)

Scorer-Specific Configuration

Some scorers require additional configuration:

Table	Field	Type	Applies To
`[config]`	`use_default_weights`	boolean	KindScorer — when `true`, use default weight map
`[[config.weights]]`	`kind`	string	KindScorer — custom weight kind
`[[config.weights]]`	`weight`	float	KindScorer — custom weight value
`[[config.tag_weights]]`	`tag`	string	TagScorer — configured tag name
`[[config.tag_weights]]`	`weight`	float	TagScorer — configured tag weight
`[[config.scorers]]`	`type`	string	CompositeScorer — child scorer type
`[[config.scorers]]`	`weight`	float	CompositeScorer — child weight
`[config]`	`inner_scorer`	string	ScaledScorer — inner scorer type

Score Comparison

Score assertions use epsilon tolerance comparison:

abs(actual_score - expected_score) < score_epsilon

This addresses floating-point representation differences across languages and platforms. The default epsilon of 1e-9 is sufficient for all algorithms in this specification. Test vectors that require different tolerances specify a custom [tolerance] table.

Slicing Vectors

Slicing vectors test slicer algorithms with pre-scored input.

Schema

Table	Field	Type	Description
`[test]`	`name`	string	Test name
`[test]`	`stage`	string	`"slicing"`
`[test]`	`slicer`	string	Slicer type: `"greedy"`, `"knapsack"`, `"quota"`
`[budget]`	`target_tokens`	integer	Token budget for selection
`[[scored_items]]`	`content`	string	Item content
`[[scored_items]]`	`tokens`	integer	Token count
`[[scored_items]]`	`score`	float	Pre-computed relevance score
`[[scored_items]]`	`kind`	string (optional)	ContextKind (used by QuotaSlice)
`[expected]`	`selected_contents`	array of string	Content values of selected items (set comparison — order does not matter)

Slicer-Specific Configuration

Table	Field	Type	Applies To
`[config]`	`bucket_size`	integer	KnapsackSlice — discretization bucket size (default: 100)
`[config]`	`inner_slicer`	string	QuotaSlice — inner slicer type
`[[config.quotas]]`	`kind`	string	QuotaSlice — ContextKind
`[[config.quotas]]`	`require`	float	QuotaSlice — minimum percentage
`[[config.quotas]]`	`cap`	float	QuotaSlice — maximum percentage

Set Comparison

Slicer output is compared as a set — the order of items in selected_contents does not matter. An implementation passes if the set of selected item contents exactly matches the expected set. This is because slicers select items but do not determine presentation order (that is the placer’s responsibility). This applies to all slicers including QuotaSlice — ordering is always the placer’s responsibility, not the slicer’s.

Placing Vectors

Placing vectors test placer algorithms with pre-scored input.

Schema

Table	Field	Type	Description
`[test]`	`name`	string	Test name
`[test]`	`stage`	string	`"placing"`
`[test]`	`placer`	string	Placer type: `"chronological"`, `"u-shaped"`
`[[items]]`	`content`	string	Item content
`[[items]]`	`tokens`	integer	Token count
`[[items]]`	`score`	float	Pre-computed relevance score
`[[items]]`	`timestamp`	datetime (optional)	UTC timestamp (used by ChronologicalPlacer)
`[expected]`	`ordered_contents`	array of string	Content values in expected output order (ordered comparison)

Ordered Comparison

Placer output is compared as an ordered list — the position of each item matters. An implementation passes if the output items, in order, match the expected ordered_contents exactly.

Pipeline Vectors

Pipeline vectors test the full 6-stage pipeline end-to-end.

Schema

Table	Field	Type	Description
`[test]`	`name`	string	Test name
`[test]`	`stage`	string	`"pipeline"`
`[budget]`	`max_tokens`	integer	Maximum token capacity
`[budget]`	`target_tokens`	integer	Target token budget for selection
`[budget]`	`output_reserve`	integer	Tokens reserved for output (default: 0)
`[config]`	`slicer`	string	Slicer type
`[config]`	`placer`	string	Placer type
`[config]`	`deduplication`	boolean	Whether deduplication is enabled
`[config]`	`overflow_strategy`	string (optional)	`"throw"`, `"truncate"`, or `"proceed"` (default: `"throw"`)
`[[config.scorers]]`	`type`	string	Scorer type
`[[config.scorers]]`	`weight`	float	Scorer weight (used for CompositeScorer weighting)
`[[items]]`	`content`	string	Item content
`[[items]]`	`tokens`	integer	Token count
`[[items]]`	`kind`	string (optional)	ContextKind
`[[items]]`	`timestamp`	datetime (optional)	UTC timestamp
`[[items]]`	`priority`	integer (optional)	Numeric priority
`[[items]]`	`tags`	array of string (optional)	Item tags
`[[items]]`	`futureRelevanceHint`	float (optional)	Relevance hint
`[[items]]`	`pinned`	boolean (optional)	Whether item is pinned (default: false)
`[[expected_output]]`	`content`	string	Content values in expected output order (ordered comparison)

Ordered Comparison

Pipeline output is compared as an ordered list — both the selected items and their presentation order must match. An implementation passes if the output items, in order, match the expected_output entries exactly.

Diagnostics Vectors

Diagnostics vectors extend pipeline vectors with an [expected.diagnostics] sub-table. They assert on which items were included or excluded, why, and aggregate counts. Diagnostics vectors are pipeline-level only (stage = "pipeline"). The [expected.diagnostics] table composes with [[expected_output]] — a single vector file can assert on both output order and diagnostic details simultaneously.

Schema

Table	Field	Type	Required	Description
`[[expected.diagnostics.included]]`	`content`	string	yes	Item content (matches placed order)
`[[expected.diagnostics.included]]`	`score_approx`	float	yes	Expected score (epsilon tolerance)
`[[expected.diagnostics.included]]`	`inclusion_reason`	string	yes	Reason: `"Scored"`, `"Pinned"`, `"ZeroToken"`
`[[expected.diagnostics.excluded]]`	`content`	string	yes	Item content (sorted by score desc)
`[[expected.diagnostics.excluded]]`	`score_approx`	float	yes	Expected score (epsilon tolerance)
`[[expected.diagnostics.excluded]]`	`exclusion_reason`	string	yes	Reason discriminator: `"BudgetExceeded"`, `"Deduplicated"`, `"QuotaCapExceeded"`
`[[expected.diagnostics.excluded]]`	`item_tokens`	integer	conditional	Token count of excluded item (required for `BudgetExceeded`)
`[[expected.diagnostics.excluded]]`	`available_tokens`	integer	conditional	Remaining budget at exclusion (required for `BudgetExceeded`)
`[[expected.diagnostics.excluded]]`	`deduplicated_against`	string	conditional	Content of duplicate kept (required for `Deduplicated`)
`[expected.diagnostics.summary]`	`total_candidates`	integer	no	Total items considered
`[expected.diagnostics.summary]`	`total_tokens_considered`	integer	no	Sum of all candidate token counts

Ordering

included entries appear in placed order, matching the order of [[expected_output]] entries. excluded entries appear sorted by score descending.

Optionality

All three sub-tables (included, excluded, summary) are independently optional. A vector can assert on any combination: included items only, excluded items only, summary counts only, or any mix.

Compatibility

[expected.diagnostics] is a dotted-key sub-table under [expected] and does not conflict with [[expected_output]] (different key names). This is valid TOML 1.0. A single pipeline vector file may contain both [[expected_output]] (ordered output assertion) and [expected.diagnostics] (diagnostic assertion).

Example

[test]
name = "Pipeline diagnostics: BudgetExceeded exclusion reason"
stage = "pipeline"

# Expected values below are final. A live integration test against run_traced
# will be added in Phase 29.
#
# Budget: target=200, max=1000, reserve=0.
#
# Items:
#   "fits":    tokens=150, kind=Message, timestamp=Jun
#   "too-big": tokens=400, kind=Message, timestamp=Jan
#
# Score (RecencyScorer, 2 timestamped, denominator=1):
#   "too-big" (Jan): rank 0 → 0.0
#   "fits"    (Jun): rank 1 → 1.0
#
# Slice (Greedy, target=200):
#   Density sort: fits(1.0/150≈0.00667), too-big(0.0/400=0.0)
#   fits: 150 ≤ 200 → selected (remaining=50)
#   too-big: 400 > 50 → excluded (BudgetExceeded)
#
# Expected diagnostics:
#   included: fits (score=1.0, reason=Scored)
#   excluded: too-big (score=0.0, reason=BudgetExceeded, item_tokens=400, available=50)
#   summary: total_candidates=2, total_tokens_considered=550

[budget]
max_tokens = 1000
target_tokens = 200
output_reserve = 0

[config]
slicer = "greedy"
placer = "chronological"
deduplication = false

[[config.scorers]]
type = "recency"
weight = 1.0

[[items]]
content = "fits"
tokens = 150
kind = "Message"
timestamp = 2024-06-01T00:00:00Z

[[items]]
content = "too-big"
tokens = 400
kind = "Message"
timestamp = 2024-01-01T00:00:00Z

[[expected_output]]
content = "fits"

[expected.diagnostics.summary]
total_candidates = 2
total_tokens_considered = 550

[[expected.diagnostics.included]]
content = "fits"
score_approx = 1.0
inclusion_reason = "Scored"

[[expected.diagnostics.excluded]]
content = "too-big"
score_approx = 0.0
exclusion_reason = "BudgetExceeded"
item_tokens = 400
available_tokens = 50

Field Types

Type	TOML Representation	Notes
string	`"text"`	UTF-8 string
integer	`42`	Signed 64-bit integer
float	`0.5`	IEEE 754 double-precision
boolean	`true` / `false`
datetime	`2024-06-15T12:00:00Z`	RFC 3339, always UTC
array of string	`["a", "b"]`

Extensibility

Future versions of the conformance suite may add new fields to existing tables. Implementations SHOULD ignore unknown fields in test vector files rather than raising errors. This enables forward compatibility as the specification evolves.

Cupel Specification