Ask any ordinary user how much the contents of PACER are worth and answers span the spectrum from tens to hundreds of millions. Ask how much it is to store those documents and the answer ranges from thousands to millions. With the recent congressional push surrounding the “freedom of the courts” and the subsequent judicial pushback, now more than ever the public needs a better idea of PACER’s inner workings. In the ongoing battle between Congress and the judiciary, many one-liners have bubbled to the top – notably the claim that any sort of replacement system for PACER would easily clear the billions marker in costs. While some have mocked those estimates, here we attempt to add some grounded, empirical estimates to the discussion.
Some details of the costs of the Public Access to Court Electronic Records system (a.k.a. PACER) are opaque to the public. While we know how much revenue PACER brings in ($146.4 million in 2016) and how much it costs to operate (~$3 million in 2016), it’s less clear how much the complete contents of PACER would cost with its current fee structure or the cost of an alternative system to maintain them.
This uncertainty is largely caused by the fee structure of PACER itself. The flat rate charge of $0.10 per page (with a cap of $3.00 per document), an antiquated pricing structure with origins in the era of photocopied court documents, sounds benign. However, when trying to calculate access costs, it’s not enough to know the number of cases filed, we need to know both the number of documents filed, the types of documents, and all page lengths to get an accurate estimate. It’s simply not feasible for anyone else to do this and give the exact number besides PACER.
“Slapping a pricetag” on PACER is not a simple fixed rate calculation even though PACER practices a level of pricing transparency with its public cost guidelines. There are variable costs for pages returned, document types, and even for generic queries. With these variable costs in mind, we set out to estimate the cost of downloading a year of civil and criminal litigation on PACER. When we say the cost to download PACER, we mean how much we would incur in PACER charges if one decided to query and download every publicly accessible docket sheet and attached document pertaining to civil and criminal cases for a given year of filed cases. We built a methodology to estimate the cost of all cases filed in 2016 and the cost of all documents filed to those cases in the subsequent years since filing.
What exactly is 'cost'?
There are two unique components in the cost of a case:
- (A) The cost of the docket sheet
- (B) The cost of the documents attached to the docket sheet
A docket sheet contains the participants, judge, court information, and docket entries. This report serves as the chronological index card of a case: tracking motions, logging petitions, recording judicial actions, and updating document filings throughout the lifetime of the case. Docket sheets make up a trivial portion of case costs – even if there are hundreds of docket entries, we get saved by the $3.00 maximum charge limit from PACER when the report exceeds 30 pages.
The bulk of the case costs come from documents. Documents include party filings, judicial opinions, transcripts, and other types of court records. They are nearly always text-based PDFs, indexed by docket entries, sometimes with multiple documents per docket entry. Any readable document attached to a docket is charged at the $0.10 per page rule (for up to 30 pages) except for judge’s opinions which are supposed to be free. A case with a lot of documents filed into it quickly becomes a money pit if you want to view the full story.
Modelled cost: One Million Lincolns
We built a dataset of all docket sheets filed in federal district courts in 2016 using a combination of buying them directly from PACER and utilizing CourtListener’s RECAP archive. We modelled the two components of cost listed above using our docket data from 2016 and combining ensemble regression methods and stratified random sampling by simulation. Assuming you know the case ID for every civil and criminal case filed in 2016 in the 94 district courts, we estimate that you could download all their docket sheets from PACER for somewhere between $228,000 and $245,000. For roughly 350,000 cases, that comes in at less than a dollar per docket – not bad. Before you go getting your wallet, you might want to consider the document cost though.
We parsed each of the ~350,000 dockets and identified just over 11 million documents attached to them. We did not attempt to gather metadata for 11 million documents, but instead settled on a close to home sample and used the Northern District of Illinois as our sample court. The Northern District of Illinois was the third most document-rich district for 2016 filings, with over half-a-million documents attached to their docket sheets. Using document metadata about document length and cost from document landing pages on PACER, we ran hundreds of simulations modelling what those 11 million documents might look like across the 94 courts. The result of those simulations is an estimated average cost somewhere between $5,200,000 and $5,500,00 for all documents in all cases opened in 2016.
Our hypothetical net total PACER bill for one year (2016) comes out somewhere between $5.5 million and $5.75 million. Far from cheap, but also not quite the “hundreds of millions” that one might think about the value of PACER’s contents given its revenue.
You might ask “where is the other money?”. Importantly, our estimate is the cost to access everything related to a single filing year once. PACER has been around for decades and undoubtedly the documents and dockets it hosts are accessed by multiple parties more than once and at different points in time. Given that revenue is about access to and not the core value of the data, it’s not so difficult to understand how yearly revenue is significantly more. It also provides important context – even though most of the PACER revenue is from many downloads of ‘popular’ dockets and documents, its revenue in 2016 would approximate to downloading every single filed docket and document in that year 25 times.
Modern cost: Twelve Thousand Washingtons
Knowing how much a years’ worth of PACER content costs is half the battle—we are also curious what the technical costs would be with modern web infrastructure to serve the current demand. To do this we’re going to use the pricing from Amazon’s S3 storage service, a common option for web services currently. To calculate the cost for S3 storage, we need to estimate how much disk space the 2016 filings would require.
With simulations we estimate that the uncompressed contents of PACER’s 2016 filings are estimated to be in the ballpark of 4.7 TiB, or less than half of Wikipedia’s uncompressed contents in 2015. That size estimate would earn us a monthly storage charge of $120 or $1,442 annually. If every docket and document was downloaded 25 times in a year, the requests and data transfer costs would be roughly $11,000 annually given our average file sizes. The combined net is about $12,268 in hosting costs annually for a years-worth of filings. This is significantly less than what is charged now to access the data and this should be a high estimate since we are not accounting for any attempt to optimize the storage or transfer of data. In any case, if we were paying for the fair cost of the data storage and bandwidth to access an average docket report it would cost approximately 0.00001 cents — a far cry from the roughly 67 cents it costs us now.
Granted the three million dollars that it costs to run PACER per year is not all spent on hard drives and internet bandwidth – a large portion of that should be to cover the salaries of the IT professionals that maintain and improve the system. But no other department of the government charges the public to access its data in order to pay for its IT professionals, nor should they. In 2020 these individuals are vital to every department of the government and its mission—these jobs should be funded through their budget from Congress and there should be no worry about a ‘down’ year in revenue where the funding for the jobs could simply evaporate.