Profile

Me and Wife

The picture is taken in Tsukuba hill with my wife (a Ph.D. major in Linguistics, University of Tsukuba).
We were tired in that time. (She is cute, right? ^.^)
We met in high school and married on 2016.
E-mail: dongyuyang@nec.com

Research Field

  • πŸ“Š Spatial index and vector search (PhD)
  • πŸ€– LLM/ML/NLP for DB/tabular data, DB for LLM/ML/NLP
  • πŸ“ˆ Multimodal RAG, Reasoning, LLM VLM

News

Latest Updates

  • πŸŽ‰ We released Jellyfish-7B, 8B, 13B on huggingface -- our large language model designed for data preprocessing. With small parameters, Jellyfish allows for cost-effective local execution without compromising data security and delivers performance on a par with GPT-4 on many data preprocessing tasks such as entity matching, data imputation, and error detection.

Publications

You can also see DBLP.

2024
Jellyfish: A Large Language Model for Data Preprocessing [Paper] [HF model]
Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024 Main Long)
On the Use of Large Language Models for Table Tasks (Tutorial) [Github page]
Yuyang Dong, Masafumi Oyamada, Chuan Xiao, Haochen Zhang
33rd ACM International Conference on Information and Knowledge Management (CIKM 2024)
Large Language Models as Data Preprocessors [Paper]
Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
2nd International Workshop on Tabular Data Analysis, International Conference on Very Large Data Bases. (TaDA workshop@VLDB 2024)
2023
QA-Matcher: Unsupervised Entity Matching Using A Question Answering Model [Slide]
Shogo Hayashi, Yuyang Dong, Masafumi Oyamada
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2023)
DeepJoin: Joinable Table Discovery with Pre-trained Language Models [Slide]
Yuyang Dong, Chuan Xiao, Takuma Nozawa, Masafumi Enomoto, Masafumi Oyamada
International Conference on Very Large Data Bases. (VLDB 2023)
2022
Table Enrichment System for Machine Learning [Paper] [Demo Youtube]
Yuyang Dong, Masafumi Oyamada
Demo paper, International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2022)
2021
Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach [Paper] [Extended Version]
Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, Masafumi Oyamada
International Conference on Data Engineering (ICDE 2021)
Quality Control for Hierarchical Classification with Incomplete Annotations
Masafumi Enomoto, Kunihiro Takeoka, Yuyang Dong, Masafumi Oyamada, Takeshi Okadome
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2021)
Entity Matching with String Transformation and Similarity-Based Features
Kazunori Sakai, Yuyang Dong, Masafumi Oyamada, Kunihiro Takeoka, Takeshi Okadome
Workshop on Software Foundations for Data Interoperability (SFDI 2021@VLDB 2021 Workshop)
2020
Learning from Unsure Responses
Kunihiro Takeoka, Yuyang Dong, Masafumi Oyamada
AAAI Conference on Artificial Intelligence (AAAI 2020)
NGNC: A Flexible and Efficient Framework for Error-Tolerant Query Autocompletion
Yukai Miao, Jianbin Qin, Sheng Hu, Yuyang Dong, Yoshiharu Ishikawa, Makoto Onizuka
Workshop on Software Foundations for Data Interoperability (SFDI 2020@VLDB 2020 Workshop)
Continuous Top-k Spatial-Keyword Search on Dynamic Objects [Paper]
Yuyang Dong, Chuan Xiao, Hanxiong Chen, Jefferey Xu Yu, Kunihiro Takeoka, Masafumi Oyamada, and Hiroyuki Kitagawa
The VLDB Journal, Springer. (VLDBJ)
2019
Continuous Search on Dynamic Spatial Keyword Objects [Paper]
Yuyang Dong, Hanxiong Chen, Hiroyuki kitagawa
Short paper, International Conference on Data Engineering (ICDE 2019)
Balanced Nearest Neighborhood Query in Spatial Database
Sang Le, Yuyang Dong, Hanxiong Chen, Kazutaka Furuse.
Short paper. International Conference on Big Data and Smart Computing (BigComp 2019)
2018
Weighted Aggregate Reverse Rank Queries [Paper]
Yuyang Dong, Hanxiong Chen, Jeffrey Xu Yu, Kazutaka Furuse, Hiroyuki Kitagawa.
ACM Transactions on Spatial Algorithms and Systems (TSAS)
Bound-and-filter Framework for Aggregate Reverse Rank Queries
Yuyang Dong, Hanxiong Chen, Kazutaka Furuse, Hiroyuki kitagawa
Transactions on Large-Scale Data and Knowledge-Centered Systems (TLDKS)
Efficient Methods for Aggregate Reverse Rank Queries
Yuyang Dong, Hanxiong Chen, Kazutaka Furuse, Hiroyuki Kitagawa
IEICE Transactions on Information and Systems.
2017
Grid-Index algorithm for reverse rank queries. [Paper]
Yuyang Dong, Hanxiong Chen, Jeffrey Xu Yu, Kazutaka Furuse, Hiroyuki Kitagawa
International Conference on Extending Database Technology (EDBT 2017)
Efficient Processing of Aggregate Reverse Rank Queries.
Yuyang Dong, Hanxiong Chen, Hiroyuki kitagawa.
Short paper. International Conference on Database and Expert Systems Applications (DEXA 2017)
2016
"Aggregate Reverse Rank Queries" [Paper]
Yuyang Dong, Hanxiong Chen, Kazutaka Furuse, Hiroyuki Kitagawa
International Conference on Database and Expert Systems Applications (DEXA 2016) (Best Paper Award)

Awards & Activities

πŸ† Academic Awards
  • πŸŽ“ DBSJ Kambayashi Young Researcher Award, 2022
  • πŸŽ“ 筑泒倧学・学長葨彰, 2019
  • πŸŽ“ 筑泒倧学・システム情報ε·₯ε­¦η ”η©Άη§‘γƒ»εšε£«εΎŒζœŸθͺ²η¨‹γƒ»η·δ»£, 2019
πŸ’Ό Professional Experience
  • 🏒 Internship, Cloud & Solution Group Company, TOSHIBA JAPAN Co., Ltd. 2014.8
  • 🏒 Internship, Smart Center, NTT DATA Co., Ltd. 2014.10
πŸ“ Academic Service
  • πŸ“š Reviewer: TKDE Journal, KAIS Journal, IEICE Journal, IEEE ACCESS Journal, IPSJ Journal
  • 🎯 Conference Reviews: ICMR'18, ICML'19,20, ICSC'19, NeurIPS'19,20 AAAI'20,21, DASFAA'20
  • πŸ‘₯ PC/OC Member: DASFAA'20, DEIM'20, MIPR'21