Davis Liang

I lead the Machine Learning Team at Abridge AI and apply my research in multilinguality, automatic speech recognition (ASR), and large language models (LLMs) to reinvent healthcare one conversation at a time. Previously, I was a Senior Research Scientist at Meta AI working on large-scale pretraining of multilingual language models.

Prior to Meta, I focused on question answering, information retrieval, machine translation, and speech recognition as an Applied Scientist at Amazon (AWS) AI. I was also a Software Engineer at Yahoo and obtained my MS degree in Computer Science from UC San Diego, where I was advised by Prof. Gary Cottrell.

Research Interests

I am interested in:

ML for Social Good, particularly addressing challenges in underserved sectors like healthcare and education and supporting underserved communities through improved capabilities for low-resource languages.
Safe ML, through rigorous evaluation methodologies and well-designed guardrails.
ML Beyond LLMs, exploring world models, diffusion architectures, and other emerging approaches that push beyond current paradigms.

Contact

Please send all research and work-related inquiries to davisblaine.liang(at)gmail.com.

News

Aug 19, 2025	Excited to share our new paper, “The Science of Confabulation Elimination,” on building systems that detect and eliminate hallucinations in AI-generated clinical documentation. [Article]
Jun 24, 2025	Proud to be a part of Abridge’s $300M Series E led by Andreessen Horowitz and fueling our next phase building agentic AI for healthcare conversations. [Article]
Sep 11, 2024	I had the opportunity to talk about the past, present, and future of AI in Healthcare with Out-of-Pocket Health. [Article].
Sep 2, 2023	We are releasing the Belebele dataset, a first-of-its-kind multilingual reading comprehension dataset spanning 122 language variants, 27 language families, and 29 scripts. [Paper] [Github] [Twitter]
Aug 28, 2023	I had a great time chatting with the New York Times about generative AI and the role of ML talent in supercharging the field of healthcare.
Apr 2, 2023	I’m excited to announce that I’m joining Abridge AI to work on reinventing healthcare for doctors and patients alike!
Jan 28, 2023	We are releasing XLM-V, a multilingual model with a 1 million token vocabulary [Link]. The model is also open-sourced in HuggingFace Transformers.
Feb 22, 2022	After four years at Amazon, I’ll be moving on to a new role. I’ll officially joining Meta AI (formerly Facebook AI) as a Senior Research Scientist in March!
Feb 20, 2018	Officially joining Amazon AI in East Palo Alto as an Applied Scientist where I’ll be working on speech recognition, information retrieval, and question answering.

Selected Publications

Please refer to my Google Scholar for a full list of publications

(*=equal contribution)

Arxiv

XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

Liang, Davis, Gonen, Hila, Mao, Yuning, Hou, Rui, Goyal, Naman, Ghazvininejad, Marjan, Zettlemoyer, Luke, and Khabsa, Madian

EMNLP 2023 2023

Bib HTML

@article{liang2023xlmv,
  title = {XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models},
  author = {Liang, Davis and Gonen, Hila and Mao, Yuning and Hou, Rui and Goyal, Naman and Ghazvininejad, Marjan and Zettlemoyer, Luke and Khabsa, Madian},
  journal = {EMNLP 2023},
  year = {2023},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2301.10472.pdf},
  bibtex_show = {true},
  selected = {true}
}

Arxiv

The BELEBELE Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

Bandarkar, Lucas, Liang, Davis, Muller, Benjamin, Artetxe, Mikel, Shukla, Satya Narayan, Husa, Donald, Goyal, Naman, Krishnan, Abhinandan, Zettlemoyer, Luke, and Khabsa, Madian

arXiv preprint arXiv:2308.16884 2023

Bib HTML

@article{bandarkar2023belebele,
  title = {The BELEBELE Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants},
  author = {Bandarkar, Lucas and Liang, Davis and Muller, Benjamin and Artetxe, Mikel and Shukla, Satya Narayan and Husa, Donald and Goyal, Naman and Krishnan, Abhinandan and Zettlemoyer, Luke and Khabsa, Madian},
  journal = {arXiv preprint arXiv:2308.16884},
  year = {2023},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2308.16884.pdf},
  bibtex_show = {true},
  selected = {true}
}

Arxiv

Attention-guided generative models for extractive question answering

Xu, Peng*, Liang, Davis*, Huang, Zhiheng, and Xiang, Bing

arXiv preprint arXiv:2110.06393 2021

Bib HTML

@article{xu2021attention,
  title = {Attention-guided generative models for extractive question answering},
  author = {Xu, Peng* and Liang, Davis* and Huang, Zhiheng and Xiang, Bing},
  journal = {arXiv preprint arXiv:2110.06393},
  year = {2021},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2110.06393.pdf},
  bibtex_show = {true},
  selected = {true}
}

Arxiv

Embedding-based Zero-shot Retrieval through Query Generation

Liang, Davis*, Xu, Peng*, Shakeri, Siamak, Santos, Cicero Nogueira dos, Nallapati, Ramesh, Huang, Zhiheng, and Xiang, Bing

arXiv preprint arXiv:2009.10270 2020

Bib HTML

@article{liang2020embedding,
  title = {Embedding-based Zero-shot Retrieval through Query Generation},
  author = {Liang, Davis* and Xu, Peng* and Shakeri, Siamak and Santos, Cicero Nogueira dos and Nallapati, Ramesh and Huang, Zhiheng and Xiang, Bing},
  journal = {arXiv preprint arXiv:2009.10270},
  year = {2020},
  abbr = {Arxiv},
  html = {https://arxiv.org/pdf/2009.10270.pdf},
  bibtex_show = {true},
  selected = {true}
}

ACL

Masked language model scoring

Salazar, Julian, Liang, Davis, Nguyen, Toan Q, and Kirchhoff, Katrin

ACL 2020

Bib HTML

@article{salazar2019masked,
  title = {Masked language model scoring},
  author = {Salazar, Julian and Liang, Davis and Nguyen, Toan Q and Kirchhoff, Katrin},
  journal = {ACL},
  year = {2020},
  abbr = {ACL},
  html = {https://arxiv.org/pdf/1910.14659.pdf},
  bibtex_show = {true},
  selected = {true}
}

EMNLP Findings

Improve transformer models with better relative position embeddings

Huang, Zhiheng, Liang, Davis, Xu, Peng, and Xiang, Bing

EMNLP Findings 2020

Bib HTML

@article{huang2020improve,
  title = {Improve transformer models with better relative position embeddings},
  author = {Huang, Zhiheng and Liang, Davis and Xu, Peng and Xiang, Bing},
  journal = {EMNLP Findings},
  year = {2020},
  abbr = {EMNLP Findings},
  html = {https://arxiv.org/pdf/2009.13658.pdf},
  bibtex_show = {true},
  selected = {true}
}

Resistance AI

Decoding and Diversity in Machine Translation

Roberts, Nicholas, Liang, Davis, Neubig, Graham, and Lipton, Zachary C

NeurIPS Resistance AI Workshop 2020

Bib HTML

@article{roberts2020decoding,
  title = {Decoding and Diversity in Machine Translation},
  author = {Roberts, Nicholas and Liang, Davis and Neubig, Graham and Lipton, Zachary C},
  journal = {NeurIPS Resistance AI Workshop},
  year = {2020},
  abbr = {Resistance AI},
  html = {https://arxiv.org/pdf/2011.13477.pdf},
  bibtex_show = {true},
  selected = {true}
}

SLT

Learning noise-invariant representations for robust speech recognition

Liang, Davis, Huang, Zhiheng, and Lipton, Zachary C

2020

Bib HTML

@article{liang2018learning,
  title = {Learning noise-invariant representations for robust speech recognition},
  author = {Liang, Davis and Huang, Zhiheng and Lipton, Zachary C},
  hournal = {IEEE SLT},
  abbr = {SLT},
  html = {https://arxiv.org/pdf/1807.06610.pdf},
  bibtex_show = {true},
  selected = {true}
}

IJCNLP

Deep automated multi-task learning

Liang, Davis, and Shu, Yan

IJCNLP 2017

Bib HTML

@article{liang2017deep,
  title = {Deep automated multi-task learning},
  author = {Liang, Davis and Shu, Yan},
  journal = {IJCNLP},
  year = {2017},
  abbr = {IJCNLP},
  html = {https://arxiv.org/pdf/1709.05554.pdf},
  bibtex_show = {true},
  selected = {true}
}