Generative AI has recently witnessed an exciting development: using language to understand images, video, audio, molecules, time-series, and other "modalities." Multimodal retrieval exemplifies this advancement, allowing us to search one modality using another.