Accepted Papers


Links: [OpenReview Portal] [NeurIPS Site]
Note: Authors are encouraged to contact us to add links to posters, videos, and other materials related to their paper.


Improved Baselines with Visual Instruction Tuning
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Can LLM-Generated Misinformation Be Detected?
Canyu Chen, Kai Shu

Prometheus: Inducing Evaluation Capability in Language Models
Seungone Kim, Jamin Shin, Yejin Cho, Joel Jang, Shayne Longpre, Hwaran Lee, Sangdoo Yun, Seongjin Shin, Sungdong Kim, James Thorne, Minjoon Seo

Instruction-tuned LLMs with World Knowledge are More Aligned to the Human Brain
Khai Loong Aw, Syrielle Montariol, Badr AlKhamissi, Martin Schrimpf, Antoine Bosselut
[Poster] [Video] [Arxiv]

Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu, Matei Zaharia, Pieter Abbeel

Reflection-Tuning: Recycling Data for Better Instruction-Tuning
Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Tianyi Zhou

Supervised Fine-Tuning of Large Language Models on Human Demonstrations Through the Lens of Memorization
Yubin Ge, Devamanyu Hazarika, Yang Liu, Mahdi Namazifar

Grounding Code Generation with Input-Output Specifications
Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri, Alex Polozov

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
Keming Lu, Hongyi Yuan, Zheng Yuan, Runji Lin, Junyang Lin, Chuanqi Tan, Chang Zhou, Jingren Zhou

Training Speech Recognition Models to Follow Instructions
Cheng-I Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang, Ruiyi Zhang, Jiuxiang Gu, Yufan Zhou, Nedim Lipka, Diyi Yang, Tong Sun

Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning
Sagar Sakhinana, Venkataramana Runkana

Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers
Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low
[Video] [Project Page]

Learning to Generate Instructions to Adapt Language Models to New Tasks
Nihal Nayak, Yiyang Nan, Avi Trost, Stephen Bach

An Emulator for Fine-tuning Large Language Models using Small Language Models
Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher Manning

Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen

Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin

Delve into PPO: Implementation Matters for Stable RLHF
Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang

NLPBench: Evaluating Large Language Models on Solving NLP Problems
Linxin Song, Jieyu Zhang, Lechao Cheng, Pengyuan Zhou, Tianyi Zhou, Irene Li

Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing
Xinyu Hu, Pengfei Tang, Simiao Zuo, Zihan Wang, Bowen Song, Qiang Lou, Jian Jiao, Denis Charles

URIAL: Tuning-Free Instruction Learning and Alignment for Untuned LLMs
Bill Yuchen Lin, Abhilasha Ravichander, Ximing Lu, Nouha Dziri, Melanie Sclar, Khyathi Chandu, Chandra Bhagavatula, Yejin Choi

Verbosity Bias in Preference Labeling by Large Language Models
Keita Saito, Akifumi Wachi, Koki Wataoka, Youhei Akimoto

Automatic Construction of a Korean Toxic Instruction Dataset for Ethical Tuning of Large Language Models
SungJoo Byun, Dongjun Jang, Hyemi Jo, Hyopil Shin

Fine-tuning Language Models for Factuality
Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher Manning, Chelsea Finn

Self-RAG: Self-reflective Retrieval Augmented Generation
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

Hierarchical Network Fusion for Multi-Modal Electron Micrograph Representation Learning with Foundational Large Language Models
Sagar Sakhinana, Sannidhi Geethan, Venkataramana Runkana

Exploring and Improving the Spatial Reasoning Abilities of Large Language Models
Manasi Sharma

Investigating the Effects of Zero-Shot Chain-of-Thought on Empathetic Dialogue Generation
Young-Jun Lee, Dokyong Lee, Jihui Im, Joo Won Sung, Ho-Jin Choi
[Poster]

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, Huaxiu Yao

Chain of Natural Language Inference for Reducing Large Language Model Hallucinations
Deren Lei, Yaxi Li, Mengya Hu, Mingyu Wang, Xi Yun

Chain-of-Thought Reasoning is a Policy Improvement Operator
Hugh Zhang, David Parkes

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang, Yibo Jiang, Chenghao Yang, Han Liu, Yuxin Chen

Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks
Lingfeng Sun, Devesh Jha, Chiori Hori, Siddarth Jain, Radu Corcodel, Xinghao Zhu, Masayoshi Tomizuka, Diego Romeres

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang, Zhangchen Xu, Luyao Niu, Boxin Wang, Jinyuan Jia, Bo Li, Radha Poovendran

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell

A Case Study of Instruction Tuning with Mixture of Parameter-Efficient Experts
Oleksiy Ostapenko, Lucas Caccia, Zhan Su, Nicolas Le Roux, Laurent Charlin, Alessandro Sordoni

Investigating the Catastrophic Forgetting in Multimodal Large Language Models
Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
Aditi Jha, Sam Havens, Jeremy Dohmann, Alexander Trott, Jacob Portes

Let’s Reinforce Step by Step
Sarah Pan, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky

DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datasets
Young-Jun Lee, Byungsoo Ko, Han-Gyu Kim, Jonghwan Hyeon, Ho-Jin Choi
[Poster]

Knowledge Augmented Instruction Tuning for Zero-shot Animal Species Recognition
Zalan Fabian, Zhongqi Miao, Chunyuan Li, Yuanhan Zhang, Ziwei Liu, Andres Hernandez, Pablo Arbelaez, Andrés Link, Andrés Montes-Rojas, Rafael Escucha, Laura Siabatto, Rahul Dodhia, Juan Lavista Ferres

Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

NexusRaven: a commercially-permissive Language Model for function calling
Venkat Krishna Srinivasan, Zhen Dong, Banghua Zhu, Brian Yu, Hanzi Mao, Damon Mosk-Aoyama, Kurt Keutzer, Jiantao Jiao, Jian Zhang

How Long Can Context Length of Open-Source LLMs truly Promise?
Dacheng Li, Rulin Shao, Anze Xie, Ying Sheng, Lianmin Zheng, Joseph Gonzalez, Ion Stoica, Xuezhe Ma, Hao Zhang

Learning to Generate Better Than Your LLM
Jonathan Chang, Kianté Brantley, Rajkumar Ramamurthy, Dipendra Misra, Wen Sun

From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL
Xiaoqian Li, Ercong Nie, Sheng Liang

Reward Model Aggregation
Zihao Wang, Chirag Nagpal, Alexander D’Amour, Victor Veitch, Sanmi Koyejo

Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions
Taehyeon Kim, Joonkee Kim, Gihun Lee, Se-Young Yun

Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Di Jin, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar, SUNGJIN LEE, Yang Liu, Mahdi Namazifar

Releasing the CRaQAn (Coreference Resolution in Question-Answering): An open-source dataset and dataset creation methodology using instruction-following models
Rob Grzywinski, Joshua D’Arcy, Robert Naidoff, Ashish Shukla, Alex Browne, Ren Gibbons, Brinnae Bent

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning
Hongyu Hu, Jiyuan Zhang, Minyi Zhao, Zhenbang Sun

Understanding Hidden Context in Preference Learning: Consequences for RLHF
Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell

Past as a Guide: Leveraging Retrospective Learning for Python Code Completion
Seungyoun Shin, Seunggyu Chang, Sungjoon Choi

FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets
Neng Wang, Hongyang Yang, Christina Wang

Large Language Models are Zero Shot Hypothesis Proposers
Biqing Qi, Kaiyan Zhang, Haoxiang Li, Kai Tian, Sihang Zeng, Zhang-Ren Chen, Bowen Zhou

OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro Von Werra, Shayne Longpre

Approximate Clustering for Extracting Task Relationships in Multi-Instruction Tuning
Dongyue Li, Jinhong Yu, Hongyang Zhang

Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao, John Dang, Aditya Grover

Platypus: Quick, Cheap, and Powerful Refinement of LLMs
Ariel Lee, Cole Hunter, Nataniel Ruiz

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo

Learning Interactive Real-World Simulators
Sherry Yang, Yilun Du, Seyed Kamyar Seyed Ghasemipour, Jonathan Tompson, Dale Schuurmans, Pieter Abbeel

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models
Xiao-Yang Liu, Guoxuan Wang, Hongyang Yang, Daochen Zha

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie

A Monte Carlo Language Model Pipeline for Zero-Shot Sociopolitical Event Extraction
Erica Cai, Brendan O’Connor

An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, yelong shen

For Distillation, Tokens Are Not All You Need
Mrigank Raman, Pranav Mani, Davis Liang, Zachary Lipton

Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo

Simulating Iterative Human-AI Interaction in Programming with LLMs
Hussein Mozannar, Valerie Chen, Dennis Wei, Prasanna Sattigeri, Manish Nagireddy, Subhro Das, Ameet Talwalkar, David Sontag

Balancing Multiple Objectives for Efficient Metaprompts for Data Labeling Tasks with Extensive Guidelines
Tobias Schnabel, Jennifer Neville