Customer-obsessed science

Amazon Science Fulfillment Center OAK4 in Tracy, CA

Improving LLM pretraining with better data organization

July 22, 2024

“Best-fit packing” adapts bin-packing to avoid unnecessary truncation of training documents, improving LLM performance across a wide range of tasks and reducing hallucination.

Conversational AI
AWS VP of AI and data on computer vision research at Amazon

July 19, 2024

In his keynote address at CVPR, Swami Sivasubramanian considers the many ways that Amazon incorporates computer vision technology into its products and makes it directly available to Amazon Web Services’ customers.

Computer vision
A quick guide to Amazon’s papers at ICML 2024

July 17, 2024

Learning algorithms and reinforcement learning are areas of focus, while LLM-related research — on topics such as continual learning, hallucination mitigation, and privacy — remains well represented.

Machine learning
Conference calendar
- ICML 2024
  
  Machine learning
  
  July 21 - 27, 2024
- ACL 2024
  
  Conversational AI
  
  August 11 - 16, 2024
- KDD 2024
  
  Information and knowledge management
  
  August 25 - 29, 2024

Introducing the Amazon Trusted AI Challenge

July 08, 2024

University students will compete for cash prizes in a competition to securely advance LLMs that code. To apply, submit your application before the deadline on September 1, 2024.

Conversational AI

Learn more

ViLA: Efficient video-language alignment for video question answering

Xijun Wang, Junbang Liang, Chun-Kai Wang, Kenan Deng, Yu (Michael) Lou, Ming Lin, Shan Yang

ECCV 2024

2024

In this work, we propose an efficient Video-Language Alignment (ViLA) network. Our ViLA model addresses both efficient frame sampling and effective cross-modal alignment in a unified way. In our ViLA network, we design a new learnable text-guided Frame-Prompter together with a cross-modal distillation (QFormer-Distiller) module. Pretrained large image-language models have shown promising results on problems

Computer vision
Explainable attribution using additive Gaussian processes

Xiaoyu Lu, Alexis Boukouvalas, James Hensman

Sixth Symposium on Advances in Approximate Bayesian Inference

2024

With the advances of computational power, there has been a rapid development in complex systems to predict certain outputs for industrial problems. Attributing outputs to input features, or output changes to input or system changes has been a critical and challenging problem in many real world applications. In industrial settings, a system could be a chain of large scale models or simulators, or a combination

Machine learning
Interleaved audio/audiovisual transfer learning for AV-ASR in low-resourced languages

Zhengyang Li, Patrick Blumenberg, Jing Liu, Thomas Graave, Timo Lohrenz, Siegfried Kunzmann, Tim Fingscheidt

Interspeech 2024

2024

Cross-language transfer learning from English to a target language has shown effectiveness in low-resourced audiovisual speech recognition (AV-ASR). We first investigate a 2-stage protocol, which performs fine-tuning of the English pre-trained AV encoder on a large audio corpus in the target language (1st stage), and then carries out cross-modality transfer learning from audio to AV in the target language

Conversational AI
X-Former: Unifying contrastive and reconstruction learning for MLLMs

Swetha Sirnam, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul Chilimbi, Mubarak Shah

ECCV 2024

2024

Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized the field of vision-language understanding by integrating visual perception capabilities into Large Language Models (LLMs). The prevailing trend in this field involves the utilization of a vision encoder derived from vision-language contrastive learning (CL), showing expertise in capturing overall representations while facing

Computer vision
Revisiting convolution-free Transformer for speech recognition

Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu Han, Katrin Kirchhoff

Interspeech 2024

2024

Convolution augmented Transformer architectures have dominated the field of automatic speech recognition by showing better WER results when the models are trained on relatively smaller training data. In this work, we revisit the necessity of convolution modules in the ASR encoder architecture, given that the inductive bias brought by the convolution modules may only boost performance in a low training data

Conversational AI

Amazon Robotics names 2024 Day One Fellowship Program recipients

Staff writer

July 24, 2024

Program empowers uniquely merited scholars from backgrounds historically underrepresented in STEM to become industry leaders through scholarship, research, and career opportunities.

Robotics

Meet the fellows

Career opportunities

We look for talent from around the world for applied scientists, data scientists, economists, research scientists, scholars, academics, PhDs, and interns.
Academic collaborations

We collaborate with leading academic organizations to drive innovation and to ensure that research is creating solutions whose benefits are shared broadly.
Photo by Zak Brickett

Awards and recognitions

Learn more about the awards and recognitions that Amazon researches from around the world have been honored with during their tenure.

Customer-obsessed science

Conference calendar

Publications

Resources

Work with us