Extracting data from PACER dockets

At SCALES, we believe that opening up court records involves not just accessing data but also understanding it. That’s why we’re adding a new package to our public software repository that parses docket pages from PACER and converts them to standardized, easy-to-use data trees (formatted in JSON, or JavaScript Object Notation). We’ve been using this parser for much of our research work, and with this public release we hope to add to the available utilities that empower members of the broader legal community to conduct their own research and ask big questions about the federal judiciary.

The new parser tool is designed to work in tandem with our existing public codebase, but it can operate on any downloaded PACER docket, whether acquired from the SCALES scraper tool or saved directly from your search results. After loading the raw HTML, the parser breaks the docket into its component parts and pulls out names, dates, and other useful data points. We hope that by automating many of the rote tasks involved in cleaning PACER data, we will help researchers improve the usefulness & uniformity of their datasets while cutting down on the time required to compile them. (You can read a full outline of the fields generated by the parser here.)

The parsed dockets can be also interpolated into any data pipeline and put to work for statistical analyses, data visualizations, large-scale aggregation, and countless other use cases. To get started with the SCALES software and explore how it can improve your workflow, you can check out the tutorial included at the top level of the `PACER-tools` repository and the more detailed documentation in each subfolder. And if you’re curious about the work we do at SCALES, or have ideas for improving these tools, drop us a line at scales-okn@northwestern.edu or open an issue on the git repository.

Leave a Reply