Categories
updates

Open Energy Data For All: Summer Data Lab 2026

Catalyst Cooperative has spent the last year or so developing educational materials for early career energy researchers, with generous support from the Sloan Foundation.

We’ve created a set of interactive Python tutorials about core data and software engineering skills which are useful in energy research. 

This summer we are organizing a 2-day energy data lab from August 11-12 at Georgia Tech. It will be a mix of tutorials, time to apply new skills to your own work, and networking opportunities. We hope this will provide you with some new skills and help you find people to collaborate with in your research career.

We are intentionally holding it immediately before the Macro-Energy Systems (MES) Workshop from August 13-14 so that you can go to both!

The lab is primarily for people who:

  • are graduate students or postdocs looking to use energy data in their research
  • feel nervous about coding, or are running into technical roadblocks
  • feel underrepresented in energy research

We have space for up to 20 attendees.

Breakfast, lunch, and dinner will be provided to all participants. Additionally, Catalyst Cooperative has funding to cover the food, travel, and lodging expenses for up to 15 attendees as a travel grant. This travel grant includes 4 nights of hotel (the lab, plus attendance at the MES workshop) plus domestic transportation (reimbursed via Catalyst Cooperative). We will accept self-funded participants as space allows. 

The tutorial section of the lab will pull from the following pool of topics, which we have been piloting over video during development:

  • working with non-CSV data types: JSON, XML, and Parquet
  • accessing remote data and APIs: the requests library
  • web scraping: the BeautifulSoup library
  • visual data exploration: pandas 201
  • making assumptions about data
  • moving from Jupyter notebooks into robust Python packages: the uv package and project manager
  • automated testing and debugging: pytest and pdb

We’ll be accepting applications through 2026-04-03 EDT and will email those accepted by the end of 2026-04-24 EDT. Note that the abstract submission deadline for the MES Workshop is 2026-03-01. You do not need to present at MES to be eligible for the data lab. All lab attendees may also attend MES as audience members.

If you have any questions about the lab or the application, please contact us at [email protected].

Categories
updates

OpenMod USA Takeaways

We had a great time attending the OpenMod USA conference at Stanford last month. Thanks to Open Energy Transition for organizing, and for inviting us to moderate a panel on open data! Thanks also to Greg Miller, Greg Schivley, Ted Nace, and our very own Christina Gosnell for speaking on our panel.

We got to meet a whole bunch of smart, friendly folks who are working on using their energy system modeling skills to facilitate the global energy transition. We learned a lot about how we can better support their work, including these high level takeaways:

  1. We’re still missing useful datasets! There wasn’t a strong front-runner for most-requested dataset, but we clearly heard a need for transmission, gas, and hourly demand, among others.
  2. Our users are interested in making their own technical systems more robust and easier to work with.

It’ll be a continuous process of improvement, of course, but we’ve started working on some projects as a result!

We do have to pick and choose which datasets to integrate first. Right now we’re focusing on natural gas data, integrating EIA 176 with the help of davidmudrauskas, and our own e-belfer is extracting transmission and distribution data from PHMSA.

One way to integrate more data more quickly is to mobilize our community to help integrate new data sources! That means we need to make contributing to PUDL much easier.

The first, most important phase of integrating a new dataset is the exploratory one. You can spend countless hours learning the specific quirks and pain points of the data. Because many of our users are already familiar with these datasets, we encourage “knowledge contributions” in the form of plain-language documentation or useful scripts that handle part of the data wrangling process. We’ve updated our contributing docs to highlight those cases, and have made a new repository to hold the teeming masses of dataset-specific knowledge.

We are also improving our Kaggle environment so that anyone can use PUDL without setting up a whole Python environment. This will make it easier for users to explore PUDL data, especially data that we have archived and/or extracted but not completely cleaned, validated, or connected. 

Apart from the dataset integrations and contribution improvements, we’re following up with folks from the conference to see how we can help them with software architecture, engineering, and infrastructure guidance – we’re looking forward to growing those relationships. If you are curious about how we can help you in this area, don’t hesitate to reach out at [email protected]!

In closing, OpenMod was a great experience! We’re excited to build a community that can do amazing things with complete, connected, granular, and accessible energy data. We’re pursuing a bit of funding to support our community efforts, so keep your fingers crossed for us and stay tuned for more updates next year!