A new article and the source code: Paragraph Aggregation Retrieval Model (PARM).
It is particularly useful for document-to-document retrieval with longer texts such as legal cases, contracts, patents and others that exceed the maximum sequence length of the encoder model.
They build the index at the level of paragraphs, and run separate queries for each paragraph of the query document.
Getting separate lists of paragraphs for each paragraph of the query document, it is possible to have a document multiple times in a list, so they aggregate them into a single ranked list with what they call Vector-based Reciprocal Rank Fusion (VRRF).
A standard reciprocal rank fusion is used for getting a single ranked list from multiple ranked lists sourced from different search systems, and it relies on the information of ranks and scores.
With VRRF, they combine dense vectors with ranks and scores to outperform all other aggregation methods.
https://github.com/sophiaalthammer/parm
https://arxiv.org/abs/2201.01614