We had a great time attending the OpenMod USA conference at Stanford last month. Thanks to Open Energy Transition for organizing, and for inviting us to moderate a panel on open data! Thanks also to Greg Miller, Greg Schivley, Ted Nace, and our very own Christina Gosnell for speaking on our panel.
We got to meet a whole bunch of smart, friendly folks who are working on using their energy system modeling skills to facilitate the global energy transition. We learned a lot about how we can better support their work, including these high level takeaways:
- We’re still missing useful datasets! There wasn’t a strong front-runner for most-requested dataset, but we clearly heard a need for transmission, gas, and hourly demand, among others.
- Our users are interested in making their own technical systems more robust and easier to work with.
It’ll be a continuous process of improvement, of course, but we’ve started working on some projects as a result!
We do have to pick and choose which datasets to integrate first. Right now we’re focusing on natural gas data, integrating EIA 176 with the help of davidmudrauskas, and our own e-belfer is extracting transmission and distribution data from PHMSA.
One way to integrate more data more quickly is to mobilize our community to help integrate new data sources! That means we need to make contributing to PUDL much easier.
The first, most important phase of integrating a new dataset is the exploratory one. You can spend countless hours learning the specific quirks and pain points of the data. Because many of our users are already familiar with these datasets, we encourage “knowledge contributions” in the form of plain-language documentation or useful scripts that handle part of the data wrangling process. We’ve updated our contributing docs to highlight those cases, and have made a new repository to hold the teeming masses of dataset-specific knowledge.
We are also improving our Kaggle environment so that anyone can use PUDL without setting up a whole Python environment. This will make it easier for users to explore PUDL data, especially data that we have archived and/or extracted but not completely cleaned, validated, or connected.
Apart from the dataset integrations and contribution improvements, we’re following up with folks from the conference to see how we can help them with software architecture, engineering, and infrastructure guidance – we’re looking forward to growing those relationships. If you are curious about how we can help you in this area, don’t hesitate to reach out at hello@catalyst.coop!
In closing, OpenMod was a great experience! We’re excited to build a community that can do amazing things with complete, connected, granular, and accessible energy data. We’re pursuing a bit of funding to support our community efforts, so keep your fingers crossed for us and stay tuned for more updates next year!