Legal general intelligence (GI) refers to artificial intelligence (AI) that encompasses legal understanding, reasoning, and decision-making, simulating the expertise of legal experts across domains. However, existing benchmarks are result-oriented and fail to systematically evaluate the legal intelligence of large language models (LLMs), hindering the development of legal GI.
To address this, we propose LexGenius, an expert-level Chinese legal benchmark for evaluating legal GI in LLMs. It follows a Dimension–Task–Ability framework, covering seven dimensions, eleven tasks, and twenty abilities. We use recent legal cases and exam questions to create multiple-choice questions, combining manual and LLM reviews to reduce data leakage risks and ensure accuracy and reliability through multiple rounds of verification.
We evaluate twelve state-of-the-art LLMs on LexGenius and conduct an in-depth analysis. Our findings
reveal significant disparities across legal intelligence abilities, with even the strongest LLMs still
lagging behind human legal professionals. We believe LexGenius can serve as a comprehensive benchmark
for assessing legal intelligence abilities in LLMs and contribute to advancing legal GI development.
Our project is available at https://github.com/QwenQKing/LexGenius.
@misc{liu2025lexgeniusbenchmark,
title = {LexGenius: An Expert-Level Benchmark for Large Language Models in Chinese Legal General Intelligence},
author = {Wenjin Liu and Haoran Luo and Xin Feng and Xiang Ji and Lijuan Zhou and Rui Mao and Jiapu Wang and Shirui Pan and Erik Cambria},
year = {2025},
eprint = {2512.04578},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2512.04578}
}