专题:Multimodal Machine Learning Applications

This cluster of papers focuses on the development and improvement of visual question answering systems, image captioning techniques, and neural networks for understanding and generating descriptions of images and videos. The research involves semantic reasoning, multimodal fusion, scene graph generation, attention mechanisms, and deep learning approaches to bridge the gap between vision and language.
最新文献
Adaptive Saliency based Contextual Metric learning for Few-shot Open-set Recognition

article Full Text OpenAlex

Personal Context as a Temporal Reference Layer (Time Layer) in Artificial Intelligence — The Last Piece of the Temporal Architecture Puzzle

article Full Text OpenAlex

A Novel Image Captioning Technique Using Deep Learning Methodology

article Full Text OpenAlex

In-Depth Analysis of Graph-Based RAG in a Unified Framework

article Full Text OpenAlex

Lifelong Learning of Large Language Model based Agents: A Roadmap

article Full Text OpenAlex

Retrieval-Augmented Generation for AI-Generated Content: A Survey

article Full Text OpenAlex

On-device explainable artificial intelligence for the semantic web of everything

article Full Text OpenAlex

CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation

article Full Text OpenAlex

SLENet: A Guidance-Enhanced Network for Underwater Camouflaged Object Detection

book-chapter Full Text OpenAlex

Semantic Phase Transitions in Transformer Observation Geometries

article Full Text OpenAlex

近5年高被引文献
MizAR 60 for Mizar 50

preprint Full Text OpenAlex 71508 FWCI12790.02538438

Neural machine translation by jointly learning to align and translate

preprint Full Text OpenAlex 14565 FWCI0

Attention Is All You Need

preprint Full Text OpenAlex 6471 FWCI96.39490293

Image--EVI on Metric Quotients for Gradient Flows

preprint Full Text OpenAlex 2899 FWCI0

Survey of Hallucination in Natural Language Generation

review Full Text OpenAlex 2578 FWCI495.37101534

Hierarchical Text-Conditional Image Generation with CLIP Latents

preprint Full Text OpenAlex 2257 FWCI0

Learning to Prompt for Vision-Language Models

article Full Text OpenAlex 2176 FWCI264.91445446

PaLM: Scaling Language Modeling with Pathways

preprint Full Text OpenAlex 2119 FWCI0

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

preprint Full Text OpenAlex 2103 FWCI0

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

preprint Full Text OpenAlex 2096 FWCI0