Privacy policies govern firms’ collection, use, sharing, and security of personal information of consumers. These rich and complex legal documents reflect contractual terms as well as mandated disclosures and compliance with data protection regimes such as the European Union’s GDPR and California’s CCPA. Privacy policies tend to be detailed and lengthy making it difficult for lay consumers to understand the terms and for regulators to police firm behavior. Our project joins recent efforts to classify the terms in privacy policies and automate their analysis using machine learning.
Machine learning relies on human-coded examples to train, adjust, and test the capabilities of AIs. Until very recently, AIs ability to process large, unstructured texts were very limited; datasets designed for legal tech reflect this by focusing on short phrases and simple legal concepts. In the past year, AI technology has made breathtaking strides in its ability to process text, largely using AI systems containing large language models (LLMs). To date, contract interpretation using LLMs has relied on untested AIs trained on generic, non-legal datasets.
Our paper makes three contributions. First, it introduces an approach and toolset for labeling privacy policies capable of generating datasets tailored for training and testing this new class of higher-capability AIs’ ability to process legal documents. We employ a granular coding approach that aims to capture the nuances inherent in contracts and to map coded terms against relevant legal benchmarks across the U.S. and the E.U. This hand-coded data set could be used as a benchmark against which to measure machine-generated coding. Second, it aims to demonstrate how a dataset generated using our approach can be used to test and modify LLMs. We offer some preliminary results in the case of privacy policies, where we “tune” LLMs to label key aspects of privacy policies and automate our coding process in a way that is more consistent with legal practice. Third, we make our data and tools publicly available for others to use and extend.
This event will be a hybrid event. The seminar will take place in Roeterseiland campus (REC) building A, room number A3.01 (Research Seminar Room), and will also be streamed online via Zoom. The Zoom link will be specified in a registration confirmation email upon registration for the event.
Florencia Marotta-Wurgler is the Boxer Family Professor of Law at NYU Law School and the co-director of the Jacobson Program on Law and Business and Center for Law, Economics, and Organization. Her research focuses on contracts, consumer privacy, consumer law, electronic commerce, empirical legal studies, and law and economics. Her research has studied consumer contracts and privacy policies online, such as whether disclosure and information privacy regimes are effective and whether people read fine print.
The Amsterdam Center for Law and Economics (ACLE) is a joint initiative of the Faculty of Economics and Business and the Faculty of Law at the University of Amsterdam. The objective of the ACLE is to promote high-quality interdisciplinary research at the intersection between law and economics.