Weeknotes for 2021-03-12

What We’re Doing

We created a new EIA 861 archive on Zenodo. It includes data from 1990-2019, to support work Karl Dunkle Werner is doing (and he’s the one that updated our scrapers, and is working on integrating the older years). We noticed that there were changes to the 2012 and 2019 EIA 861 data too. But who knows what those changes entail! There aren’t even any formulae in these spreadsheets. Imagine if they published them as CSVs directly to GitHub, and we could see the diffs. Or use tools like daff to understand how the data is changing over time?

Christina and Zane gave a keynote presentation to the 2021 Research Data Access & Preservation (RDAP) Association Summit entitled Distributing Power with Open Data. You can see our slides here.

Zane nominated himself to speak at FERC’s April 16th workshop about the scope and mission of the new Office of Public Participation, focusing on issues of public data accessibility.

Zane finished overhauling our Tox / pytest setup to make it more intuitive and flexible (in the process of documenting the development setup) All the tests that we typically run can be run without needing any command line arguments, and the tests have been split into into distinct unit tests, software integration tests, and data validations. The software dependencies have also been simplified, with only those packages requiring or benefiting from precompiled binaries being specified in both and environment.yml specifications. This was inspired by our documentation update for the v0.4.0 release since it turns out documenting an overly complicated thing is kind of a lot of work, and that work is probably better put toward simplifying the thing instead.

What We’re Reading

  • How to build a community: starting with why? The beginning of what will hopefully be a series of posts from Claire Carroll, the community manager for dbt, on how to cultivate real, supportive online communities of practice around a given project or technology. This is something we’d really like Catalyst to learn how to do in the context of PUDL, and open energy data and modeling more generally.
  • $1.3M in grants were just awarded to open source projects, with support from the Ford Foundation, Sloan Foundation, Open Society Foundations, and others, facilitated by the Open Collective Foundation. They’re trying to develop sustainable funding / support system for open source infrastructure. More information on the program over on the Open Collective and also at the Ford Foundation. Some of their research findings so far.
  • RS21 looks like an interesting data consultancy, doing a lot of work with the public sector and NGOs.
  • VSCode Atom is Dead! Long live Atom! With Microsoft’s purchase of GitHub, development on the Atom editor has waned, and it’s started to feel a bit like abandonware. And given that Microsoft also maintains VS Code, it seems like it’s only a matter of time before we have no choice but to switch… to something. So we started playing around with VS Code, and it’s great!
  • A curated list of open technology projects to sustain a stable climate, energy supply, and vital natural resources. Gotta get PUDL in their dataset section!
  • Cloud native repositories for big scientific data, a paper by Ryan Abernathey et al., talking about the benefits of “Analysis Ready Cloud Optimized” (ARCO) datasets and developments in the field. Lots of parallels with where we’re going with PUDL, but with several orders of magnitude difference in the scale of the data.
  • Eliminating Toil: a definition of Toil from Google, and some musings on why less of it is better. Definitely resonated with the word we’re doing in PUDL.
  • Another look at utility political spending from the Energy & Policy Institute. This time they’re looking FERC Account 426.4, again based on data we’ve liberated from the FERC Form 1, in combination with their own mapping between utilities and their parent holding companies. Here’s the original query from our data.