Cross-Encoder
Plain Explanation
A Cross-Encoder is a reranking model that reads the query and one candidate document together. The input often looks like [CLS] query [SEP] document [SEP], then a Transformer lets query tokens and document tokens interact directly. This makes it good at catching cases where two texts use different words but mean the same thing, or use similar words but mean different things. The tradeoff is cost: every candidate needs its own model pass, so Cross-Encoders usually rerank a small top-K candidate pool from a faster first-stage retriever.
Examples & Analogies
- Exam re-checking: a fast grader selects candidate answers, then a careful reviewer reads the question and each answer together to decide the final order.
- Support search: for “password reset email not arriving,” the model can recognize that a document about “account verification email troubleshooting” is relevant even if the wording differs.
- RAG document selection: vector search retrieves 100 candidate passages, then a Cross-Encoder chooses the best 5-10 evidence passages for the final prompt.
At a Glance
| Cross-Encoder | Bi-Encoder | BM25 | |
|---|---|---|---|
| Main role | Rerank candidates | Fast semantic retrieval | Fast lexical retrieval |
| Input style | Query + document together | Query and document separately | Tokens and statistics |
| Strength | Precise contextual judgment | Large-scale retrieval | Cheap and explainable |
| Weakness | One inference per candidate | Weaker fine interaction | Weak on synonyms/meaning |
Where and Why It Matters
- RAG quality: helps choose better evidence before generation.
- Search UX: top-10 order often matters more than returning many loosely relevant documents.
- Recall boundary: it cannot fix candidates the first-stage retriever never returned.
- Cost management: K, input length, batch size, and fallback model must be planned together.
Common Misconceptions
- ❌ Myth: Use a Cross-Encoder to search the whole corpus → ✅ Reality: it is usually a second-stage reranker after retrieval.
- ❌ Myth: a better reranker fixes all retrieval problems → ✅ Reality: if first-stage recall is low, the reranker never sees the right document.
- ❌ Myth: larger K is always better → ✅ Reality: K increases quality only until latency and cost erase the benefit.
How It Sounds in Conversation
- "Retrieve top-100 with BM25+vector, then rerank top-10 with the Cross-Encoder."
- "p95 latency means the time by which 95% of requests finish; ours is too high, so compare K=100 vs K=50."
- "NDCG@10 measures top-10 ranking quality, but we should report first-stage recall beside it."
- "For long documents, rerank chunks first and define a document-level aggregation rule."
Related Reading
References
- Shallow Cross-Encoders for Low-Latency Retrieval
Paper about latency-aware Cross-Encoder retrieval variants.
- Cross-Encoder Rediscovers a Semantic Variant of BM25
Paper analyzing Cross-Encoder relevance scoring and BM25-style signals.
- Sentence Transformers CrossEncoder
Official CrossEncoder class documentation and pair-scoring API.
- Cross-Encoder Training Overview
Official training overview for Cross-Encoder reranking and scoring tasks.
- CrossEncoderRerankingEvaluator
Official reranking evaluator interface and at_k evaluation flow.