Papers
arxiv:2507.04701

XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

Published on Jul 7
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

XiYan-SQL, a novel framework, enhances Text-to-SQL performance by generating and selecting multiple SQL candidates through schema filtering, multi-generator ensemble, and candidate reorganization.

AI-generated summary

To leverage the advantages of LLM in addressing challenges in the Text-to-SQL task, we present XiYan-SQL, an innovative framework effectively generating and utilizing multiple SQL candidates. It consists of three components: 1) a Schema Filter module filtering and obtaining multiple relevant schemas; 2) a multi-generator ensemble approach generating multiple highquality and diverse SQL queries; 3) a selection model with a candidate reorganization strategy implemented to obtain the optimal SQL query. Specifically, for the multi-generator ensemble, we employ a multi-task fine-tuning strategy to enhance the capabilities of SQL generation models for the intrinsic alignment between SQL and text, and construct multiple generation models with distinct generation styles by fine-tuning across different SQL formats. The experimental results and comprehensive analysis demonstrate the effectiveness and robustness of our framework. Overall, XiYan-SQL achieves a new SOTA performance of 75.63% on the notable BIRD benchmark, surpassing all previous methods. It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.

Community

Sign up or log in to comment

Models citing this paper 8

Browse 8 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.04701 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.