Test Vector Format

All conformance test vectors are TOML files. Each file contains a single test case with inputs, configuration, and expected outputs. This chapter documents the schema for each stage type.

Common Structure

Every test vector contains a [test] table that identifies the test:

[test]
name = "Human-readable test name"
stage = "scoring"    # one of: scoring, slicing, placing, pipeline

The stage field determines which additional tables are expected.

Scoring Vectors

Scoring vectors test individual scorer algorithms in isolation.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"scoring"
[test]scorerstringScorer type: "recency", "priority", "kind", "tag", "frequency", "reflexive", "composite", "scaled"
[[items]]contentstringItem content (used as identifier in assertions)
[[items]]tokensintegerToken count
[[items]]timestampdatetime (optional)UTC timestamp
[[items]]priorityinteger (optional)Numeric priority value
[[items]]kindstring (optional)ContextKind value
[[items]]tagsarray of string (optional)Item tags
[[items]]futureRelevanceHintfloat (optional)Caller-provided relevance hint
[[expected]]contentstringItem content to match
[[expected]]score_approxfloatExpected score (compared with epsilon tolerance)
[tolerance]score_epsilonfloatMaximum allowed absolute difference between actual and expected score (default: 1e-9)

Scorer-Specific Configuration

Some scorers require additional configuration:

TableFieldTypeApplies To
[config]use_default_weightsbooleanKindScorer — when true, use default weight map
[[config.weights]]kindstringKindScorer — custom weight kind
[[config.weights]]weightfloatKindScorer — custom weight value
[[config.tag_weights]]tagstringTagScorer — configured tag name
[[config.tag_weights]]weightfloatTagScorer — configured tag weight
[[config.scorers]]typestringCompositeScorer — child scorer type
[[config.scorers]]weightfloatCompositeScorer — child weight
[config]inner_scorerstringScaledScorer — inner scorer type

Score Comparison

Score assertions use epsilon tolerance comparison:

abs(actual_score - expected_score) < score_epsilon

This addresses floating-point representation differences across languages and platforms. The default epsilon of 1e-9 is sufficient for all algorithms in this specification. Test vectors that require different tolerances specify a custom [tolerance] table.

Slicing Vectors

Slicing vectors test slicer algorithms with pre-scored input.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"slicing"
[test]slicerstringSlicer type: "greedy", "knapsack", "quota"
[budget]target_tokensintegerToken budget for selection
[[scored_items]]contentstringItem content
[[scored_items]]tokensintegerToken count
[[scored_items]]scorefloatPre-computed relevance score
[[scored_items]]kindstring (optional)ContextKind (used by QuotaSlice)
[expected]selected_contentsarray of stringContent values of selected items (set comparison — order does not matter)

Slicer-Specific Configuration

TableFieldTypeApplies To
[config]bucket_sizeintegerKnapsackSlice — discretization bucket size (default: 100)
[config]inner_slicerstringQuotaSlice — inner slicer type
[[config.quotas]]kindstringQuotaSlice — ContextKind
[[config.quotas]]requirefloatQuotaSlice — minimum percentage
[[config.quotas]]capfloatQuotaSlice — maximum percentage

Set Comparison

Slicer output is compared as a set — the order of items in selected_contents does not matter. An implementation passes if the set of selected item contents exactly matches the expected set. This is because slicers select items but do not determine presentation order (that is the placer’s responsibility). This applies to all slicers including QuotaSlice — ordering is always the placer’s responsibility, not the slicer’s.

Placing Vectors

Placing vectors test placer algorithms with pre-scored input.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"placing"
[test]placerstringPlacer type: "chronological", "u-shaped"
[[items]]contentstringItem content
[[items]]tokensintegerToken count
[[items]]scorefloatPre-computed relevance score
[[items]]timestampdatetime (optional)UTC timestamp (used by ChronologicalPlacer)
[expected]ordered_contentsarray of stringContent values in expected output order (ordered comparison)

Ordered Comparison

Placer output is compared as an ordered list — the position of each item matters. An implementation passes if the output items, in order, match the expected ordered_contents exactly.

Pipeline Vectors

Pipeline vectors test the full 6-stage pipeline end-to-end.

Schema

TableFieldTypeDescription
[test]namestringTest name
[test]stagestring"pipeline"
[budget]max_tokensintegerMaximum token capacity
[budget]target_tokensintegerTarget token budget for selection
[budget]output_reserveintegerTokens reserved for output (default: 0)
[config]slicerstringSlicer type
[config]placerstringPlacer type
[config]deduplicationbooleanWhether deduplication is enabled
[config]overflow_strategystring (optional)"throw", "truncate", or "proceed" (default: "throw")
[[config.scorers]]typestringScorer type
[[config.scorers]]weightfloatScorer weight (used for CompositeScorer weighting)
[[items]]contentstringItem content
[[items]]tokensintegerToken count
[[items]]kindstring (optional)ContextKind
[[items]]timestampdatetime (optional)UTC timestamp
[[items]]priorityinteger (optional)Numeric priority
[[items]]tagsarray of string (optional)Item tags
[[items]]futureRelevanceHintfloat (optional)Relevance hint
[[items]]pinnedboolean (optional)Whether item is pinned (default: false)
[[expected_output]]contentstringContent values in expected output order (ordered comparison)

Ordered Comparison

Pipeline output is compared as an ordered list — both the selected items and their presentation order must match. An implementation passes if the output items, in order, match the expected_output entries exactly.

Diagnostics Vectors

Diagnostics vectors extend pipeline vectors with an [expected.diagnostics] sub-table. They assert on which items were included or excluded, why, and aggregate counts. Diagnostics vectors are pipeline-level only (stage = "pipeline"). The [expected.diagnostics] table composes with [[expected_output]] — a single vector file can assert on both output order and diagnostic details simultaneously.

Schema

TableFieldTypeRequiredDescription
[[expected.diagnostics.included]]contentstringyesItem content (matches placed order)
[[expected.diagnostics.included]]score_approxfloatyesExpected score (epsilon tolerance)
[[expected.diagnostics.included]]inclusion_reasonstringyesReason: "Scored", "Pinned", "ZeroToken"
[[expected.diagnostics.excluded]]contentstringyesItem content (sorted by score desc)
[[expected.diagnostics.excluded]]score_approxfloatyesExpected score (epsilon tolerance)
[[expected.diagnostics.excluded]]exclusion_reasonstringyesReason discriminator: "BudgetExceeded", "Deduplicated", "QuotaCapExceeded"
[[expected.diagnostics.excluded]]item_tokensintegerconditionalToken count of excluded item (required for BudgetExceeded)
[[expected.diagnostics.excluded]]available_tokensintegerconditionalRemaining budget at exclusion (required for BudgetExceeded)
[[expected.diagnostics.excluded]]deduplicated_againststringconditionalContent of duplicate kept (required for Deduplicated)
[expected.diagnostics.summary]total_candidatesintegernoTotal items considered
[expected.diagnostics.summary]total_tokens_consideredintegernoSum of all candidate token counts

Ordering

included entries appear in placed order, matching the order of [[expected_output]] entries. excluded entries appear sorted by score descending.

Optionality

All three sub-tables (included, excluded, summary) are independently optional. A vector can assert on any combination: included items only, excluded items only, summary counts only, or any mix.

Compatibility

[expected.diagnostics] is a dotted-key sub-table under [expected] and does not conflict with [[expected_output]] (different key names). This is valid TOML 1.0. A single pipeline vector file may contain both [[expected_output]] (ordered output assertion) and [expected.diagnostics] (diagnostic assertion).

Example

[test]
name = "Pipeline diagnostics: BudgetExceeded exclusion reason"
stage = "pipeline"

# Expected values below are final. A live integration test against run_traced
# will be added in Phase 29.
#
# Budget: target=200, max=1000, reserve=0.
#
# Items:
#   "fits":    tokens=150, kind=Message, timestamp=Jun
#   "too-big": tokens=400, kind=Message, timestamp=Jan
#
# Score (RecencyScorer, 2 timestamped, denominator=1):
#   "too-big" (Jan): rank 0 → 0.0
#   "fits"    (Jun): rank 1 → 1.0
#
# Slice (Greedy, target=200):
#   Density sort: fits(1.0/150≈0.00667), too-big(0.0/400=0.0)
#   fits: 150 ≤ 200 → selected (remaining=50)
#   too-big: 400 > 50 → excluded (BudgetExceeded)
#
# Expected diagnostics:
#   included: fits (score=1.0, reason=Scored)
#   excluded: too-big (score=0.0, reason=BudgetExceeded, item_tokens=400, available=50)
#   summary: total_candidates=2, total_tokens_considered=550

[budget]
max_tokens = 1000
target_tokens = 200
output_reserve = 0

[config]
slicer = "greedy"
placer = "chronological"
deduplication = false

[[config.scorers]]
type = "recency"
weight = 1.0

[[items]]
content = "fits"
tokens = 150
kind = "Message"
timestamp = 2024-06-01T00:00:00Z

[[items]]
content = "too-big"
tokens = 400
kind = "Message"
timestamp = 2024-01-01T00:00:00Z

[[expected_output]]
content = "fits"

[expected.diagnostics.summary]
total_candidates = 2
total_tokens_considered = 550

[[expected.diagnostics.included]]
content = "fits"
score_approx = 1.0
inclusion_reason = "Scored"

[[expected.diagnostics.excluded]]
content = "too-big"
score_approx = 0.0
exclusion_reason = "BudgetExceeded"
item_tokens = 400
available_tokens = 50

Field Types

TypeTOML RepresentationNotes
string"text"UTF-8 string
integer42Signed 64-bit integer
float0.5IEEE 754 double-precision
booleantrue / false
datetime2024-06-15T12:00:00ZRFC 3339, always UTC
array of string["a", "b"]

Extensibility

Future versions of the conformance suite may add new fields to existing tables. Implementations SHOULD ignore unknown fields in test vector files rather than raising errors. This enables forward compatibility as the specification evolves.