Text-to-SQL, which translates a natural language question into an SQL query, has advanced with in-context learning of Large Language Models (LLMs). However, existing methods show little improvement in performance compared to randomly chosen demonstrations, and significant performance drops when smaller LLMs (e.g., Llama 3.1-8B) are used. This indicates that these methods heavily rely on the intrinsic capabilities of hyper-scaled LLMs, rather than effectively retrieving useful demonstrations. In this paper, we propose a novel approach for effectively retrieving demonstrations and generating SQL queries. We construct a Deep Contextual Schema Link Graph, which contains key information and semantic relationship between a question and its database schema items. This graph-based structure enables effective representation of Text-to-SQL samples and retrieval of useful demonstrations for in-context learning. Experimental results on the Spider benchmark demonstrate the effectiveness of our approach, showing consistent improvements in SQL generation performance and efficiency across both hyper-scaled LLMs and small LLMs.
We introduce a Text-to-SQL method with Deep Contextual Schema Link Graph-based Retrieval, DCG-SQL. Our method consists of three main processes: Deep Contextual Schema Link Graph Construction, Graph-based Demonstration Retrieval, and SQL Generation. First, we construct a deep contextual schema link graph, which captures the contextual relationships between question and relevant schema items. Then, to retrieve demonstrations with our graph representation, we train a graph encoder with self-supervised learning. This ensures that samples closer to the anchor are more useful in generating the target SQL, enabling the effective retrieval of useful demonstrations. Finally, with the selected demonstrations, our method can fully leverage in-context learning ability, and generate the target SQL more correctly.