FrequencyScorer

The FrequencyScorer scores an item based on the proportion of peer items that share at least one tag with it.

Overview

FrequencyScorer is a relative scorer — it compares the item’s tags against every other item’s tags to determine how “connected” the item is within the candidate set. Items that share tags with many peers score higher, reflecting thematic relevance within the context.

Fields Used

FieldSourcePurpose
tagsContextItemTag overlap detection with peers

Algorithm

FREQUENCY-SCORE(item, allItems):
    if length(item.tags) = 0 or length(allItems) <= 1:
        return 0.0

    matchingItems <- 0

    for i <- 0 to length(allItems) - 1:
        // Skip self (by identity, not by value)
        if allItems[i] is item:
            continue
        if length(allItems[i].tags) = 0:
            continue
        if SHARES-ANY-TAG(item.tags, allItems[i].tags):
            matchingItems <- matchingItems + 1

    return matchingItems / (length(allItems) - 1)

Tag Overlap Detection

SHARES-ANY-TAG(tagsA, tagsB):
    for i <- 0 to length(tagsA) - 1:
        for j <- 0 to length(tagsB) - 1:
            if CASE-INSENSITIVE-EQUAL(tagsA[i], tagsB[j]):
                return true
    return false

Score Interpretation

  • A score of 1.0 means every other item in the list shares at least one tag with this item.
  • A score of 0.0 means no other item shares any tag with this item (or the item has no tags, or it is the only item).
  • Intermediate values represent the fraction of peers with overlapping tags.

Edge Cases

ConditionResult
Item has no tags0.0
Only one item in allItems0.0
No peer shares any tag0.0
All peers share at least one tag1.0
Peer has no tagsThat peer does not count as matching

Self-Exclusion

The item being scored is excluded from the peer count using reference identity (object identity), not value equality. This means:

  • If the same ContextItem instance appears multiple times in allItems, only the reference-identical instance is skipped; other copies with identical content are counted as peers.
  • The denominator is always length(allItems) - 1, regardless of how many items are skipped.

Complexity

  • Time: O(N * T_a * T_b) per item in the worst case, where N is the number of items and T_a, T_b are tag list lengths. O(N^2 * T^2) total across all items.
  • Space: O(1) auxiliary per invocation.

Conformance Notes

  • Tag comparison in SHARES-ANY-TAG MUST be case-insensitive using ASCII case folding. "Important" and "important" are considered matching tags.
  • Self-exclusion MUST use reference identity (the is check), not structural equality. This matches the ContextItem immutability contract — items are compared by identity throughout the pipeline.
  • The denominator is length(allItems) - 1, which accounts for the self-exclusion. It is not reduced further for tagless peers.