Indian Legal Corpus (ILC): A Dataset for A Dataset Summarizing Indian Legal Proceeding Using Natural Language

Pawan Trivedi1

Digha Jain2

Shilpa Gite2,3,Email

Ketan Kotecha3

Anant Bhatt4

Nithesh Naik5

1PES University, Bengaluru, 560085, Karnataka, India.
2Symbiosis Institute of Technology, Symbiosis International (Deemed University), 412115, Pune, India.
3Symbiosis Centre for Applied AI (SCAAI), Symbiosis International (Deemed University), 412115, Pune, India.
4Devang Patel Institute of Advance Technology and Research, Charotar University of Science and Technology, Nadiad Petlad Road, Changa, 388421, Gujarat, India.
5Department of Mechanical and Industrial Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India.

Abstract

There is a significant backlog of legal proceedings in several large countries, including India. There have been technical advancements in intelligent devices that can process and summarize legal documents. However, developing such data-driven systems requires a scarcity of high-quality corpora. Legal AI focuses on using artificial intelligence technology, particularly Natural Language Processing (NLP), to help with legal duties. Legal professionals frequently consider how to solve problems using rule-and symbol-based methods, but NLP researchers are more interested in data-driven and embedding methods. So, in this paper, we present Indian Legal Corpus (ILC), a dataset for Indian legal document summarization. Our dataset differs from the existing summarization datasets in a way that our summaries are highly abstractive. This dataset offers new research opportunities for Legal documents with an abstractive approach. ILC is highly abstractive, concise, and of high quality, as indicated by human and intrinsic evaluation. We are releasing our dataset and models to encourage future research on Legal abstractive summarization. 

Indian Legal Corpus (ILC): A Dataset for A Dataset Summarizing Indian Legal Proceeding Using Natural Language