Weeknotes for 2021-03-26

Our Sloan Foundation proposal was funded. We continue pushing toward a PUDL software/data release with data beta prep and documentation updates.

What We’re Doing

Our proposal to the Sloan Foundation was funded! This means we’ve got 2 years of support to work on integrating a bunch of new data, automation of our ETL pipeline, and improving access to the results, some of which we outlined in our 2021 infrastructure roadmap. In combination with the client work we’ve been engaged in, this means we’ll need to recruit a new member to the co-op. We’re trying to figure out what skills and experience we are actually short on, relative to what we need to get done.

We continue to update our documentation, as we prepare for a new software and data release. A lot of it is related to the development environment, contributing, and how to run the ETL as it is now, while the user-facing side of the docs will hopefully be much simpler, directing folks to use preprocessed data with a Docker image that’s available on Zenodo, or our JupyterHub, or Datasette. We’ve set a deadline for the release of April 19th, which is also our next Advisory Board meeting.

We decided to simplify our branch management on GitHub. Instead of having separate ephemeral sprint branches, we’re just branching off of and making PRs against dev, and merging back into main after each 2 week long sprint, which seems to be far more common among open source projects, and is a lot less to try and keep track of. It should also mean we don’t have to re-direct lingering PRs to different base branches.

Data Wrangling Galore: Austen has been cleaning up the FERC Form 1 fuel table to capture previously discarded records that need information from other rows to make sense. Christina continues to work on estimating plant operating costs based on FERC+EIA linkages we’ve developed for RMI, and still need to merge into the main PUDL repo. Zane is trying to make our unit and prime-mover level heat rate estimates more robust and complete.

What We’re Reading

Utilities are missing out on profits by delaying decarbonization. A summary of analysis done by Morgan Stanley from the Energy & Policy Institute. Turns out ditching old, largely depreciated fossil infrastructure and replacing it with capital-intensive renewables and storage is a money maker for utilities and a money saver for rate payers. Who would have thought?!

A long, high-level discussion of what energy models are, and how they can be used to support policy decisions, between Jesse Jenkins and David Roberts.

The Ghosts in the Data, a post from Vicki Boykis (filled with links to other good stuff too!) about the implicit knowledge that people accumulate after working in data for a while — the things that don’t show up in textbooks and documentation, but that everyone who has done this work for 5-10 years has learned along the way, inspired by responses to her tweet a few weeks ago:

The Influence of Hidden Researcher Decisions in Applied Microeconomics, by Nick Huntington-Klein et al. (May, 2020). A study looking at how much variation is introduced into scientific conclusions by different choices made during data preparation. The conclusions are honestly horrifying, in relation to the kind of work we (and others!) do with energy data. From the abstract:

Two published causal empirical results are replicated by seven replicators each. We find large differences in data preparation and analysis decisions, many of which would not likely be reported in a publication. No two replicators reported the same sample size. Statistical significance varied across replications, and for one of the studies the effect’s sign varied as well. The standard deviation of estimates across replications was 3-4 times the typical reported standard error.”

And this mess is before you even get into the software issues, where much research code is essentially untested, and may not even be entirely available…

5 Principles to write SOLID code in Python. A reminder of some enduring principles of software engineering:

  1. Single Responsibility Principle
  2. Open/Closed Principle
  3. Liskov Substitution Principle
  4. Interface Segregation Principle
  5. Dependency Inversion Principle

It’s easy to remember the acronym, and less easy to force yourself to actually apply the principles day to day, until you discover that you are tangled up in spaghetti of your own making.

By Zane Selvans

A former space explorer, now marooned on a beautiful, dying world.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.