The Public Utility Data Liberation (PUDL) Project

The Public Utility Data Liberation (PUDL) project is an open source python ETL pipeline that takes publicly available information about the US electricity sector and makes it publicly usable.

Electric utilities report a huge amount of information to the US government and other public agencies. This includes yearly, monthly, and even hourly data about fuel burned, electricity generated, operating expenses, power plant usage patterns and emissions. Unfortunately, much of this data is not released in well documented, ready-to-use, machine readable formats. Data from different agencies tends not to be standardized or easily used in tandem. Several commercial data services clean, package, and re-sell this this data, but at prices which are too high to be accessible to many smaller stakeholders.

PUDL cleans, links, and standardizes this data all for free!

Thus far our primary focus has been integrating data pertaining to power plant fuel use, generation, operating costs, emissions, and operation history.

As of November, 2024 PUDL integrates data from:

  • EIA Forms 923, 860, 860M, 930, AEO
  • FERC Form 1, 714
  • EPA CEMS, Power Sector Crosswalk
  • PHMSA
  • Gridpath Resource Adequacy Dataset
  • VCE RARE Power Dataset
  • NREL ATB
  • Census DP1

We also publish raw SQL databases containing consolidated but untransformed:

  • FERC 2, 6, 60

See a high-level review of these datasets and how they’re integrated into PUDL here.

The information from these sources allows users to explore the operating costs of individual power plants and see how fuel costs impact the viability of different types of generation.  It can highlight the competitiveness of renewable electricity in the market today;  it can show how the generation mix of different utilities has evolved over time; and it can indicate how fuel prices and more renewable generation have changed the usage of individual power plants.

By making this database and associated software available under liberal open data and open source licenses, we hope to enable a broader variety of stakeholders to participate quantitatively in electricity regulation and climate policy discussions at the local, state, and federal level. We want to see it used by data journalists, grassroots renewable energy activists, climate change activists, small renewable energy and demand side management companies, and non-profit organizations. You can help us keep this resource free and open for anyone to use by making a contribution to the PUDL project. Any amount is appreciated!

Check out some of the projects we’ve been supporting!

If you or your organization have other data you would like to see integrated into the database, or suggestions for how it could be made to serve your purposes more effectively, please get in touch with us. If you have questions you’d like answered, but don’t have the skills or time required to use the data yourself, we are interested in performing those analyses for other organizations at very reasonable rates. You can reach us at: hello@catalyst.coop.

Read more about PUDL on our documentation page.

PUDL is made possible by the support of our Sustainers