Categories
updates

We hired a technical writer for PUDL!

Catalyst is very excited to announce that we have hired Nancy Amandi as a technical writer for PUDL’s Google Season of Docs project (full proposal here). The project will run from June through October, during which time Nancy will work on improving our documentation to make it easier for PUDL users to navigate and find the data they need.

Currently, it’s difficult for new (and long-time) PUDL users and contributors to quickly jump in and start using PUDL because our documentation is extensive and spread out between multiple repositories and websites. We have a data dictionary page in our docs, a Datasette deployment for exploring the data, and a set of example notebooks hosted on Kaggle, but none do a particularly good job of shepherding users to the data they want. The goal of this project is to create a better, more nested, system of table/column documentation so users aren’t overwhelmed by PUDL and know where to find the latest versions of the tables that are most relevant to them! 

If you’ve ever struggled to navigate the PUDL docs and have feedback, please send an email to hello@catalyst.coop, and we will incorporate suggestions into our plan for the project.

About Nancy

Nancy is a data engineer and technical writer living in Nigeria. She’s passionate about helping data-driven businesses write clear, concise documentation to convey complex technical concepts to a diverse range of audiences. In addition to her writing, Nancy has extensive experience in creating scalable data pipelines, exhaustive data mining, explanatory datasets, analytical models, and business reporting solutions with structured, semi-structured, and unstructured data. In 2023, Nancy and her team members won the Nigeria Energy Forum Tertiary Institutions Energy Pitch Challenge for their work on OneGrid Energies, a clean tech startup working towards closing the energy affordability gap in Nigeria.

Learn more about Nancy: LinkedIn, X, GitHub

Categories
weeknotes

Weeknotes for 2021-03-26

What We’re Doing

Our proposal to the Sloan Foundation was funded! This means we’ve got 2 years of support to work on integrating a bunch of new data, automation of our ETL pipeline, and improving access to the results, some of which we outlined in our 2021 infrastructure roadmap. In combination with the client work we’ve been engaged in, this means we’ll need to recruit a new member to the co-op. We’re trying to figure out what skills and experience we are actually short on, relative to what we need to get done.

We continue to update our documentation, as we prepare for a new software and data release. A lot of it is related to the development environment, contributing, and how to run the ETL as it is now, while the user-facing side of the docs will hopefully be much simpler, directing folks to use preprocessed data with a Docker image that’s available on Zenodo, or our JupyterHub, or Datasette. We’ve set a deadline for the release of April 19th, which is also our next Advisory Board meeting.

We decided to simplify our branch management on GitHub. Instead of having separate ephemeral sprint branches, we’re just branching off of and making PRs against dev, and merging back into main after each 2 week long sprint, which seems to be far more common among open source projects, and is a lot less to try and keep track of. It should also mean we don’t have to re-direct lingering PRs to different base branches.

Data Wrangling Galore: Austen has been cleaning up the FERC Form 1 fuel table to capture previously discarded records that need information from other rows to make sense. Christina continues to work on estimating plant operating costs based on FERC+EIA linkages we’ve developed for RMI, and still need to merge into the main PUDL repo. Zane is trying to make our unit and prime-mover level heat rate estimates more robust and complete.

Categories
weeknotes

Weeknotes 2021-03-19

What We’re Doing

The Census DP1 GeoDatabase has been integrated into PUDL as a standalone SQLite DB for use with the EIA 861 to compile historical utility and balancing authority service territories, and with FERC 714 data in estimating state level historical hourly electricity demand. Previously it was had an ad-hoc non-standard ETL process.

Our documentation now has an index all of the PUDL DB tables, including the names of the columns, their data types, and descriptions of the contents, thanks to some work with Jinja templates by Austen. This is just one small part of a bigger docs overhaul as we try and get PUDL 0.4.0 out by the end of March.

PUDL is finally compatible with Python 3.9, using both pip and conda. The last dependency to make the transition was Numba, which as of v0.53.0 works with Python 3.9. Our CI is now running tests on both Python 3.8 and 3.9.