Cross-modal information retrieval refers to the process of linking and querying data across distinct modalities, such as images, text, audio, and video. This field addresses the inherent semantic gap ...