Using a combination of PACER and CourtListener’s RECAP, we acquired the roughly 350,000 docket sheets filed in 2016 in every district court. Using these docket sheets, we parsed and identified roughly 11 million documents attached to them. We then used the metadata from documents relating to the Northern District of Illinois (NDIL) docket sheets as a sample for nationwide document level metadata. Effectively, we used NDIL documents as a bootstrapping sample for the 93 other courts.
Importantly, we made a general uniformity assumption regarding nature of suit codes across the 94 district courts. We assumed that the various dockets and document samples we had from the Northern District of Illinois would broadly represent different case lengths and nuanced arguments across the other 93 courts. While we know there may be far more nuanced patent cases argued in the Eastern District of Texas, or more complex maritime suits argued near Miami in the Florida Southern District, we feel our assumption is a good first approximation given the difficulties and strain on PACER for data collection. The Northern District of Illinois was the third most document rich district with nearly 514,000 documents filed to 2016 cases, giving us confidence that it would capture the bulk of possible nuances in suit docket and document types.