loader-logo

Anatomy of a Docket: how to put PACER data to work

The name of the US federal courts’ electronic filing system, “Public Access to Court Electronic Records” (PACER), is a bit of a misnomer: in reality, no member of the general public could be expected to make head or tail of this archaic system. Built primarily for a user base of lawyers and clerks who need to upload and download documents in the course of active litigation, the PACER platform is clunky and unintuitive in most other contexts. Search functionality is limited, key information can be hard to locate, and data aggregation is often prohibitively expensive.

Under the hood, however, case data in PACER is formatted quite straightforwardly. Indeed, a wealth of useful material is accessible even at the most basic level of the PACER document hierarchy – namely, the docket sheet, which chronicles all the actions taken in a given federal court case. Since a great deal of our work at SCALES depends on automated analysis of these documents, we’ve developed a machine-readable “gold standard” ontology describing all the information that can appear in a docket sheet. In addition to providing the groundwork for SCALES’s analysis software, this docket schema also neatly outlines the conceptual space open to be explored by any researcher, journalist, or lay user working with federal court data. The sections below will delineate the three major pieces of the SCALES schema, which are also the building blocks of a typical PACER docket sheet: (1) a set of high-level header data, (2) a list of parties involved in the litigation, and (3) a table of docket events unfolding over time.

Technical note: how we parse dockets

The SCALES docket parser takes the HTML text of docket sheets as input and produces JavaScript Object Notation (JSON) files as output. JSON is a versatile file format that represents data as a series of key-value pairs, where each key is the name of a data attribute (city, filing_date, etc.) and each value is the piece of data corresponding to that named attribute (“Chicago,” “10/30/2008,” etc). In subsequent sections, references to specific keys in the SCALES schema will be printed in bold, as above.

The value of the JSON format derives from its simple, standardized nature. Whereas raw docket sheets are hard for a computer to navigate in any consistent fashion, a few lines of code are sufficient to read docket JSONs and use them as fodder for all manner of sophisticated analyses. The JSON approach also requires very little infrastructure; in fact, using our PACER-tools GitHub repository, you can easily replicate the SCALES docket-parsing pipeline and generate JSONs from your own docket sheets. (Additionally, the repository includes extended documentation of our docket schema, both in plain prose and as a formal Schema.org-style specification.)

Headers

 

Despite the incredible heterogeneity of litigation in the federal courts, a few basic properties characterize almost every case in the PACER system, and these properties are grouped together in the header section of the docket sheet. Some attributes in this section are fairly self-explanatory, such as those that SCALES has named case_id, city, filing_date, and terminating_date. Others are not so simple; for instance, the PACER keywords captured in the case_flags field, which the court clerks draw from a list of options that varies greatly between districts, can indicate anything from the presence of a certain magistrate judge on a case to the invocation of a certain sentencing provision. And still other attributes point to separate cases from elsewhere in the court’s caseload (related_cases, member_case_key) or in different venues entirely (other_courts).

There is one notable exception to the rule that PACER’s docket headers adhere to a single predictable format: specifically, the headers of civil and criminal cases differ slightly in several places. For example, judge assignments in criminal cases are listed on a per-defendant basis while civil headers specify a single judge assignment for the whole case, and certain fields (nature_suit, monetary_demand, etc.) appear in civil headers but not criminal headers. However, the SCALES docket schema has been designed for maximum consistency between cases, so all these case-type-specific fields appear in the JSON representations of criminal and civil headers alike, even when (as with judge in criminal headers) they are guaranteed to contain a null value.

Parties

 

The second piece of a typical docket sheet, i.e. the list of parties to the case, entails a marked increase in complexity from the header section. For one thing, PACER employs a wide array of terms for the roles that parties can play in litigation, from plaintiffs/defendants to claimants/counter-claimants to creditors, debtors, and trustees. Accordingly, the SCALES docket schema includes both a role field, which stores the specific term that the docket sheet uses to refer to a given party, and a party_type field specifying a generic category in which the party’s role belongs, either “plaintiff,” “defendant,” “bk_party” (bankruptcy-specific parties), “other_party,” or “misc.” This schematic convention permits sensible handling of similar role values like “Plaintiff” and “Petitioner” – since both terms refer to parties seeking a legal remedy in a court, they each receive a party_type value of “plaintiff.”

Often, the data in PACER’s party lists is not only complex but also voluminous, as in complex corporate litigation with dozens of cross-claims or trademark-infringement suits in which hundreds of vendors are named in the complaint. Further contributing to this abundance of party details is the fact that parties are usually listed alongside the names, designations, and contact information of their counsel, as well as the nature and disposition of any crimes with which they have been charged. Fortunately, the SCALES schema, by defining parties and attorneys as objects containing numerous sub-fields (like terminating_date, pending_counts, & is_pro_se for parties and office_name, address, & is_lead_attorney for attorneys), provides ample room for this proliferation of data.

It should be noted that while the SCALES parser does plenty of work to chop up and categorize pieces of the raw docket sheet, it stops short of making inferences about the content of the docket sheet. For instance, subjectivity in spelling and wording is a major contributor to the intricacy of SCALES party lists, as when the phrasing of a criminal statute varies from case to case or the spelling of a party’s name differs across multiple occurrences on a docket sheet. Resolving these idiosyncrasies would involve making deductions about the entities and concepts referenced in the docket sheet, which is not within the purview of the parser. Thus, even in the cleaned & regimented JSONs generated through the parsing process, some complexities inevitably remain.

Docket Events

 

From a human perspective, the third and final component of the PACER docket sheet – the table of docket events – is even more variegated than the list of parties, with its wide-ranging descriptions of all the complaints, answers, motions, orders, judgments, and other case events that fill the lifecycle of a case. However, from a computational perspective, the docket table is perhaps the simplest piece of the schema. For each row of the table, the SCALES docket array has an entry with three subfields: a date_filed, an ind (the numerical index of the entry in PACER), and an unedited docket_text.

Beyond these basic data, the SCALES schema also accounts for the actual PDF documents that are filed in the course of the case and thereafter linked from the docket sheet, as well as instances in which one docket entry references another via a hyperlink, which the schema refers to as edges between docket entries. Yet the network of entries and documents is inherently incomplete in the parsed final product, because the SCALES parser is only designed to handle the plain text of PACER docket sheets, whereas the infrastructure required to read and interpret scanned PDF documents is still a work in progress.

However, the largest barrier to meaningful research into docket events is not technological, but financial; the steep costs of retrieving docket sheets apply tenfold to the prospect of downloading case documents, since each individual PACER document is billed by the page just like the docket sheet itself. In the future, judicial fee waivers, increased funding support, or changes to federal billing policy may clear the way for even more meaningful research into the structure and content of the information-rich PACER docket sheet database. Until then, though, even the limited data available as an input to the SCALES docket parser can continue to provide some much-needed clarity in the vast, obscure world of federal court events.


One thought on “Anatomy of a Docket: how to put PACER data to work”

Leave a Reply