On the Use of Large Language Models for Table Tasks

CIKM 2024 tutorial

View project on GitHub

On the Use of Large Language Models for Table Tasks

[slides]

The proliferation of LLMs has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a variety of table-related tasks, such as table understanding, text-to-SQL conversion, and tabular data preprocessing.

It surveys LLM solutions to these tasks in five classes, categorized by their underpinning techniques: prompting, fine-tuning, RAG, agents, and multimodal methods. It discusses how LLMs offer innovative ways to interpret, augment, query, and cleanse tabular data, featuring academic contributions and their practical use in the industrial sector.

It emphasizes the versatility and effectiveness of LLMs in handling complex table tasks, showcasing their ability to improve data quality, enhance analytical capabilities, and facilitate more intuitive data interactions. By surveying different approaches, this tutorial highlights the strengths of LLMs in enriching table tasks with more accuracy and usability, setting a foundation for future research and application in data science and AI-driven analytics.

Outline

  1. Introduction
  2. Prompting
  3. Fine-tuning
  4. Retrieval-augmented generation (RAG)
  5. LLM agents
  6. Vision-language models (VLMs)

Presenters

Yuyang Dong is a Principal Researcher at NEC.He earned his Ph.D. degree from the University of Tsukuba in 2019. He specializes in tabular data searching and NLP, with his expertise in \textbf{tabular data processing} evidenced by publications in prestigious venues. Dong leads the project [Jellyfish], a leading-edge LLM for tabular data processing that has garnered thousands of monthly downloads on Hugging Face. He is a key contributor to NEC cotomi-core, a suite of robust, self-developed LLMs in both Japanese and English, underpinning NEC’s Generative AI Service.

Masafumi Oyamada is the Chief Scientist and Director of Generative AI Foundations Research at NEC. He was awarded his Ph.D. degree from the University of Tsukuba in 2018. At NEC, he spearheads research and development efforts in the domain of LLMs, including the creation of cotomi-core, a series of high-performance and efficient LLMs. His interdisciplinary work has bridged the gap between tabular data and machine learning.

Chuan Xiao is an Associate Professor at Osaka University and a Guest Associate Professor at Nagoya University. He completed his Ph.D. at the University of New South Wales in 2010. His research interests span data preprocessing, computational social science, and NLP. With over 15 years of research experience in similarity search, Xiao has published 20+ related papers at top-tier conferences (ICML, SIGMOD, VLDB, WWW, etc.).

Haochen Zhang is currently pursuing his Master’s degree at Osaka University. He obtained his Bachelor’s degree from Osaka City University in 2023. His research areas of interest encompass data mining, data preprocessing, and NLP. During his internship at NEC, he focused on exploring prompting and fine-tuning LLMs for data preprocessing. He has contributed to the development of Jellyfish – a leading-edge LLM specifically designed for data preprocessing.