The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes

In this second episode of the Neural Information Retrieval Talks podcast, Andrew Yates and Sergi Castella discuss the paper “The The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes“ Paper: This paper puts the spotlight on the popular IR benchmark MS MARCO and investigates whether modern neural retrieval models retrieve documents that are even more relevant than the original top relevance annotations. The results have important implications and raise the question of to what degree this benchmark is still an informative north star to follow. Contact: castella@ Timestamps: 00:00 Co-host introduction 00:26 Paper introduction 02:18 Dense vs. Sparse retrieval 05:46 Theoretical analysis of false positives(1) 08:17 What is low vs. high dimensional representations 11:49 Theoretical analysis o false positives (2) 20:10 First results: growing the MS-Marco index 28:35 Adding r
Back to Top