You Don’t Have to Install PUDL Anymore

We’re excited to announce that you no longer have to install the PUDL Python library to access electric generation data linked across FERC and EIA such as capacity factor, heat rate, and fuel cost. These, and many others, are now available directly in the PUDL database, which you can download from Zenodo here. You can find more details on how to access the data here.

We were able to complete this large infrastructural overhaul with the help of generous funding from the Sloan foundation.

Now that you can use any tools you want to analyze the data, here are some ideas:

  • Use the same type of Python code you have been using, but freed from our tangled web of dependencies!
  • Use another language you like better: R, Rust, Ruby, or even other languages that don’t start with R (Julia?)
  • Use Kaggle to check out our data without installing any programming environments at all!
  • Hook up a BI tool to quickly generate low/no-code dashboards and visualizations!

Since we’re moving away from downstream use of the library, we are also deprecating the PudlTabl class. It will still work, for now, but it’s now just a shell around accessing the database tables and will be removed in a future release.

One further change we made during all of this was to rename a bunch of tables to make them a little easier to find and understand. Tables now have standardized prefixes, the nuances of which are explained in the docs. The short version is:

  • When in doubt, start with tables with the out_* prefix. These have been cleaned and connected into wide tables with lots of metadata and are designed to be easy to use for downstream analysis.
  • When you need to dig deeper, look at the core_* tables. These are the cleaned up building blocks of the out_* tables. You may need to join several core_* tables to get the metadata you want.
  • The tables starting with an underscore are intermediate assets. They’re not stable, so please don’t rely on the data in them.

We hope these changes make it easier for a wider variety of users to use our data! Now that we’ve wrapped up this infrastructural work, we’ll shift our focus back to integrating new datasets like PHMSA and EIA 176.

If you want help getting started with our data, or have any datasets you’d like us to integrate, we’d love to talk: drop by our office hours and we’ll walk you through any questions you might have.


Weeknotes 2021-03-19

What We’re Doing

The Census DP1 GeoDatabase has been integrated into PUDL as a standalone SQLite DB for use with the EIA 861 to compile historical utility and balancing authority service territories, and with FERC 714 data in estimating state level historical hourly electricity demand. Previously it was had an ad-hoc non-standard ETL process.

Our documentation now has an index all of the PUDL DB tables, including the names of the columns, their data types, and descriptions of the contents, thanks to some work with Jinja templates by Austen. This is just one small part of a bigger docs overhaul as we try and get PUDL 0.4.0 out by the end of March.

PUDL is finally compatible with Python 3.9, using both pip and conda. The last dependency to make the transition was Numba, which as of v0.53.0 works with Python 3.9. Our CI is now running tests on both Python 3.8 and 3.9.