专题:Multimodal Machine Learning Applications

This cluster of papers focuses on the development and improvement of visual question answering systems, image captioning techniques, and neural networks for understanding and generating descriptions of images and videos. The research involves semantic reasoning, multimodal fusion, scene graph generation, attention mechanisms, and deep learning approaches to bridge the gap between vision and language.
最新文献
Noise-Aware Image Captioning with Progressively Exploring Mismatched Words

article Full Text OpenAlex

CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot

article Full Text OpenAlex

Shooting condition insensitive unmanned aerial vehicle object detection

article Full Text OpenAlex

Multi-Source Domain Adaptation with Mixture of Joint Distributions

article Full Text OpenAlex

Comparing Traditional and LLM-based Search for Image Geolocation

preprint Full Text OpenAlex

Multi-Source and Multi-Target Domain Adaptation Based on Dynamic Generator with Attention

article Full Text OpenAlex

Multi-level cross-modal contrastive learning for review-aware recommendation

article Full Text OpenAlex

Cooperative Connection Transformer for Remote Sensing Image Captioning

article Full Text OpenAlex

A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition

article Full Text OpenAlex

Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential

article Full Text OpenAlex

近5年高被引文献
MizAR 60 for Mizar 50

preprint Full Text OpenAlex 70225 FWCI12462.80304531

Towards Learning Terminological Concept Systems from Multilingual Natural Language Text

preprint Full Text OpenAlex 16995 FWCI1888.2480427

Attention Is All You Need

preprint Full Text OpenAlex 6466 FWCI62.6566869

Learning Transferable Visual Models From Natural Language Supervision

preprint Full Text OpenAlex 5296 FWCI0

UCF-101: A dataset of 101 human actions classes from videos in the wild

preprint Full Text OpenAlex 4432 FWCI0

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

article Full Text OpenAlex 3257 FWCI276.19703219

Exploring Simple Siamese Representation Learning

article Full Text OpenAlex 3039 FWCI347.67920918

Contextual Personal Intelligence: A New Paradigm for AI That Evolves With You

preprint Full Text OpenAlex 2873 FWCI0

Survey of Hallucination in Natural Language Generation

review Full Text OpenAlex 2392 FWCI468.35077814

Hierarchical Text-Conditional Image Generation with CLIP Latents

preprint Full Text OpenAlex 2251 FWCI0