A brief review of our cost estimation dataset
Using a combination of PACER and CourtListener’s RECAP, we acquired the roughly 350,000 docket sheets filed in 2016 in every district court. Using these docket sheets, we parsed and identified roughly 11 million documents attached to them. We then used the metadata from documents relating to the Northern District of Illinois (NDIL) docket sheets as a sample for nationwide document level metadata. Effectively, we used NDIL documents as a bootstrapping sample for the 93 other courts.
Importantly, we made a general uniformity assumption regarding nature of suit codes across the 94 district courts. We assumed that the various dockets and document samples we had from the Northern District of Illinois would broadly represent different case lengths and nuanced arguments across the other 93 courts. While we know there may be far more nuanced patent cases argued in the Eastern District of Texas, or more complex maritime suits argued near Miami in the Florida Southern District, we feel our assumption is a good first approximation given the difficulties and strain on PACER for data collection. The Northern District of Illinois was the third most document rich district with nearly 514,000 documents filed to 2016 cases, giving us confidence that it would capture the bulk of possible nuances in suit docket and document types.
Exploring the data
The documents metadata from the NDIL dockets enabled us to take a critical look at the relationship between dockets and documents, particularly by nature of suit codes. We stratified our analysis by the PACER nature of suit subtypes (found here). Additionally, due to their relative infrequency in NDIL, we grouped Civil Detainee, Federal Tax Suits, and Forfeiture/Penalty, with Unknown natures of suit into a “Miscellaneous” category.
We found some interesting trends in our NDIL samples. Personal Injury cases were twice as common as any other case type in the district, and their dockets come with a hefty price tag – nearly $9,000. When we investigate document costs below however, we see their corresponding documents are among the cheapest by suit. There is some disparity in average document cost per case with Bankruptcy and Property Rights documents being the most expensive. Bankruptcy cases take the gold for pages per case with nearly 1,000 pages on average in a Bankruptcy suit – it seems the PACER $3.00 maximum charge is beneficial to users in search of these records.
We also noted that Immigration and Social Security cases were very inaccessible to us. We were only able to make it to the landing page for 1 out of every 5 or 10 documents in cases filed under these suits. This is a feature, not a bug, to protect potentially sensitive individual information. Nevertheless, we can still parse the total number of documents in these cases from their docket entries even if we cannot access metadata about the documents.
Finally, we see that the average Personal Injury case is over 3 years in length, which is likely a function of class actions and multi-district litigation stretching out these cases. Of note however is that this extended case duration does not contribute to any noticeable increased document costs on the case.
Using the docket sheets and NDIL document data, we created a stratified random sampling by simulation methodology to estimate nationwide PACER costs. We’ll detail the exact methodology in a forthcoming blog post for those that want to dig deeper. While our simulation returned a consistent range around $5.4 million to “download PACER documents”, we wanted to learn more about the cost differences both by district and nature of suit.
By suit type
We sliced our simulation results into individual districts and nature of suit subtypes to better understand our findings. What we see below is the average estimated cost for each of the subtypes. We estimate it would cost over $1 million to aggregate all Civil Rights document information just for cases filed in 2016 in all 94 courts.
From a data storage perspective, if you could get enough coins from your couch cushions to download those nearly 38,000 Civil Rights cases, you would still need to find 700 GiB of storage somewhere to keep them.
Based on the mixture of suit types and number of identified document links, we were able to build cost estimates for each district court. In 17 of the 94 courts, we estimate it would require at least $100,000 to download court documents for a single filing year; five of those courts would require $200,000 or more to download their data.
We can also use our docket sheets to map out and confirm common beliefs about certain case types. For example, it is well known that there are two high volume “patent districts”, one for Silicon Valley, and one in East Texas. Mapping the case counts clearly shows these two as the top patent processors, but we also see the Northern District of Illinois and the Southern District of New York as high-volume patent processors as well compared to the rest of the country.
Given the various nature of suit mixes across different courts, we also projected what the simulated average case cost would be per court. Nevada, Colorado, Southern Indiana, and Eastern Texas all have average case costs over $20. On the flip side we found that Southern West Virginia and Eastern Louisiana enjoy relatively low average case costs.
While it would be great if this map were entirely blue, that is not the world we live in. In many districts, one case is an average of $20. A researcher, scholar, lawyer, or advocate compiling more than one sample case to study in those districts would immediately exceed the $30 per quarter waiver that PACER has instituted. Compiling 50 cases would take $1,000 and compiling 500 cases would take $10,000. In many other districts it takes just three cases to exceed the waiver. Extracting trends from the courts outside of single case studies takes a lot of information, and it seems regardless of which district you choose, the costs add up.