Categories
updates

Capturing the elusive FERC EQR

The Federal Energy Regulatory Commission (FERC) collects a lot of data in its role as the regulator of US electricity markets. One dataset that we’ve had our eyes on for years, but never had the resources to integrate into PUDL (until now!) is the Electric Quarterly Report or EQR.

The EQR is the reporting mechanism FERC uses for public utilities to fulfill their responsibility under section 205(c) of the Federal Power Act (FPA) to have their rates and charges on file in a convenient form and place.

Under this section of the FPA:

Where two or more public utilities are parties to the same rate schedule or tariff, each public utility transmitting or selling electric energy subject to the jurisdiction of this Commission shall post and file such rate schedule

In modern US electricity markets, there’s more than a million of these transactions every day. This makes posting them in a “convenient form and place” non-trivial. FERC provides an online EQR report viewer where you can scroll through a list of thousands of sellers and download their filing for a particular quarter as a summary PDF. But this is no way to do analysis.

They also offer bulk downloads of the raw data as quarterly zipfiles containing CSVs or XML documents, but with more than 4 billion transactions recorded since the modern reporting era started in 2013, these formats are also not very ergonomic for bulk analysis. And even as compressed zipfiles, the full FERC EQR dataset is 100 GB. It’s far too big analyze in bulk locally with tools like pandas, let alone spreadsheets. However, with the development of tools like DuckDB and Polars over the last few years, working with this scale of tabular data has become much easier — even on a laptop!

The data locked up in the FERC EQR is potentially valuable for understanding how renewable energy power purchase agreements and battery storage prices have evolved over time, and what’s really driving rising energy costs. The EQR data is particularly valuable for less organized markets like the US Southeast and Intermountain West, where there is no ISO/RTO market data. We’re hopeful that this data can also make it easier to programmatically identify anti-competitive utility affiliate transactions and uneconomic self-dispatch by regulated monopolies.

Democratizing FERC EQR Access

With additional support from Gigawatt-tier PUDL Sustainer GridLab, Catalyst is going to create an open access, cloud-native version of the FERC EQR dataset, designed for bulk analysis.

  • Timeframe: We expect to be working on the project from now through the first quarter of 2026.
  • Data Coverage: We are initially targeting the 2013Q3 to present data, since it is all in a very similar format, while older EQR data was submitted under a different reporting regime, in a different format, and is also not as rich in detail (though we have already archived the raw 2002-2013 data).
  • Outputs: We’re planning to mirror the structure of the current data, with four main tables: transactions, contracts, filer identities, and index publications, along with a new table that records all FERC company IDs and tracks how their reported name and other attributes change over time.
  • Format: The data will be distributed as a collection of Apache Parquet files, probably partitioned by quarter. All of the data from 2013Q3 to the present will use the same well-defined schema, informed by all of the historical EQR data dictionaries published by FERC.
  • Processing: Our first priority is making the bulk data freely accessible in a format that’s appropriate for analysis. We are not planning on doing any major data processing, beyond enforcing a uniform schema, converting the data to Parquet, and dealing with character encoding and CSV formatting issues that affect ~1% of all records.
  • Versioning: Due to the size of the data and the limited free cloud storage we have as part of the AWS Open Data Registry we won’t be able to provide multiple historical versions of the outputs like we do with other PUDL data. Whenever we update the outputs, the older version will be replaced.
  • Updates: We plan to update the outputs at least once a quarter, and will capture fresh snapshots of the raw EQR data using our automated data archiving scripts.
  • Access: We plan to publish the FERC EQR alongside the rest of our PUDL data in a freely accessible S3 bucket as part of the AWS Open Data Registry. We also intend to make the EQR outputs available for preview and querying through the PUDL Data Viewer, which will allow users to download smaller subsets of the data as CSVs for spreadsheet-based analysis if that’s what they need.

Get Involved!

  • If you’re already familiar with the FERC EQR and want to help test out the new data we’d love to hear from you: [email protected]
  • We’d like to better understand how this data is used in any context, whether it’s research, policy, journalistic, or commercial, so if you’ve got a use case in mind for historical electricity transaction data, let us know what it is. Maybe sign up for office hours so we can chat more, or help you get familiar with Parquet files.
  • What other data does the FERC EQR need be connected to? One of the things we’re hoping this analysis-ready, cloud-native version of EQR will enable is robust record linkage with other datasets. Can we identify the LMP nodes where power is being delivered? Can we identify electricity buyers in other FERC or EIA data, even though all we have is their name? What else should we be thinking about?
  • Stay up to date with the project: Subscribe to our ~monthly newsletter, keep track of the FERC EQR issues in the main PUDL repo, or follow the EQR scoping repo where we’ll be prototyping the new system.

After 5 years of ogling the EQR from afar, and projecting our (data) hopes and dreams into its depths, we’re excited to finally tackle it for real, and hope it’ll be a valuable resource for understanding US energy markets and accelerating the ever more economical decarbonization of the US electricity system.

Categories
updates

VCE RARE Renewable Generation Profiles in PUDL

Catalyst is helping GridLab, Pattern Energy Group, and Vibrant Clean Energy distribute a new open (CC-BY-4.0 licensed) dataset produced by Vibrant Clean Energy which provides hourly, county-level wind and solar generation profiles based on NOAA’s High Resolution Rapid Refresh (HRRR) weather model. The data release was announced at the ESIG’s Fall Technical Workshop on October 21st, 2024.

The new data is included in PUDL v2024.10.0 in Apache Parquet format. You can explore the dataset in this Jupyter notebook on Kaggle. The original 8760 hourly data as CSVs can be downloaded from Zenodo as well.

To learn more about why this kind of data is vital to the energy transition, check out ESIG’s report on Weather Data for Power System Planning.

VCE RARE Press Release

Categories
updates

Insights From 60+ Energy Data User Interviews

To kick off our NSF POSE grant work, over 4 weeks in July and August we interviewed more than 60 energy data users as part of NSF’s Innovation Corps program (I-Corps). I-Corps helps POSE awardees better understand their users and contributors, and the potential for fostering a sustainable open source ecosystem.

Some of our interviewees were already PUDL users, and many of them weren’t. A fair number of the PUDL users were at organizations we’d never encountered before! We talked to academic researchers and advocates working at non-profits, but also people at for-profit companies, and folks working in the public sector. We even had the chance to talk to some utilities. Interviewee technical and energy domain backgrounds were diverse: from spreadsheet-only NGOs to startups working with cloud-based data pipelines and orchestration frameworks, and everything in between. There were software engineers and lawyers that argue at FERC, grass roots advocates and regional electricity planning organizations too.

It was an intense month for our sometimes introverted team, but overall it was a good experience and we learned a lot. So we thought we’d share some of our high-level takeaways, and see if they resonate the broader energy data community.

Categories
updates

Workplace Democracy and Open Source

This year at csv,conf,v8 in Puebla, Mexico I gave a talk on our experience as a democratic worker cooperative creating digital public goods, and why we think co-ops are potentially a good fit for creating public-interest technology. You can watch the recorded talk on YouTube, or read on for a bloggified version of the talk below.

Categories
updates

Integrating PUDL with PyPSA-USA

We recently found out that Kamran Tehranchi, one of two primary maintainers of the PyPSA-USA open source power system model, was working on adapting it to use open data that we publish through our Public Utility Data Liberation Project (PUDL), so we interviewed him over email to find out more about his experience making the switch.

Can you tell us a little bit about yourself? What problems are you working on? Where are you at?

Sure! I’m currently a PhD Student at Stanford University working in the Interdisciplinary Energy Systems (INES) Lab. By way of my research, I am also an energy system modeler and open-source software developer. My work focuses on electricity system planning, specifically on the impact of electricity transmission resolution within planning models. I primarily work with engineering-economic simulation and optimization models, mainly production cost simulations and capacity expansion models. I use these models to design and simulate future energy systems to understand the impacts of emerging technologies, policies, and climate-energy system interactions. One of the main projects I’ve been working on this past year is the PyPSA-USA planning model which in-part leverages PUDL to develop the electricity system data model.

Categories
updates

Beating the Utility Holding Company Shell Game

We’re excited to be part of the Mozilla Technology Fund’s 2024 cohort, which is focusing on open source AI for environmental justice!

We’re going to use Mozilla’s support to link US Securities and Exchange Commission data about utility ownership to financial and operational information in the EIA forms 860/861/923, and through our previous record linkage work involving the EIA data, to FERC Form 1 respondents and the EPA’s continuous emissions monitoring system data.

The SEC Form 10-K is published through EDGAR as structured XBRL data, but the Exhibit 21 attachment that describes which companies own and are owned by other companies is unfortunately just a PDF blob that gets stapled to the XBRL, and so ownership relationships end up being unstructured, or at best, semi-structured data.

We’re going to apply document modeling tools that we’ve developed in some of our client work (to extract structured data from PUC and other regulatory filings) to extract the ownership information from Exhibit 21. This will hopefully include the ownership percentages when they are reported.

Then we’re going to use the generalized entity matching / record linkage tooling that we developed under our previous Climate Change AI Innovation Grant to connect the parent / subsidiary companies named in the SEC data to the financial and operational data reported by the same utility companies in FERC Form 1, as well as EIA and EPA data.

The record linkage / entity matching system that we’ve ultimately settled on is based on the excellent (and publicly funded!) Splink library, which relies on DuckDB to enable local linkages on datasets of up to tens of millions of records. Robin Linacre (one of the Splink maintainers) has a tutorial explaining the probabilistic model of record linkage used by Splink, if you’re interested in the internals.

Why is this work important? Being able to make effective energy policy often requires an understanding of the political economy of utilities, and utilities are often composed of Russian doll-like nested holding companies. It can be hard to see where one utility ends and another begins. Understanding which entities share ownership and thus political and economic interests is key to being able to grapple with and influence them.

We’ll be learning from prior work on this problem done by the folks at CorpWatch, and we hope to make the outputs of our work easy to visualize and explore through the Oligrapher interface that LittleSis has developed.

If this work is interesting or useful to you, we’d love to hear more about your use case! You can track our work through this GitHub repository. Also, while we are explicitly focused on and familiar with utilities, the SEC’s Form 10-K covers all publicly traded companies, so we may be producing additional data outputs that aren’t useful to us but which could be useful to others. If that’s you, please let us know.

Categories
updates

Rescuing Historical FERC Data

UPDATE 2022-01-19: We have received word from FERC that access to the historical data discussed below will be restored this week. As it becomes available we will also archive it on Zenodo just in case. Thank you to everyone who reached out and helped bring this issue to FERC’s attention!

This week we discovered that decades worth of energy system data collected by the Federal Energy Regulatory Commission (FERC) had been removed from the agency’s website. They apparently have no plan to archive it or migrate it to another platform. We are attempting to obtain a bulk download of all this data so we can archive it alongside our other raw data sources on Zenodo.

This data records many financial, operational, and economic aspects of the US energy system. It is a unique and valuable resource for anyone trying to understand how public policy and market conditions have shaped our energy system over time. Simply deleting this data with no warning, no plan to archive it, or migrate it to another platform is completely unacceptable.

If you know someone within FERC who can help get us a copy of this data to archive publicly, please put us in touch: [email protected]

Categories
linkstream

Data pipelines, stewardship and cleaning

Categories
updates

PUDL v0.5.0: 2020 and Beyond

It’s been almost a month since we pushed out our first actual quarterly software and data release: PUDL v0.5.0! The main impetus for this release was to get the final annual 2020 data integrated for the FERC and EIA datasets we process. We also pulled in the EIA 860 data for 2001-2003, which is only available as DBF files, rather than Excel spreadsheets. This means we’ve got coverage going back to 2001 for all of our data now! Twenty years! We don’t have 100% coverage of all of the data contained in those datasets yet, but we’re getting closer.

Beyond simply updating the data, we’ve also been making some significant changes to how our ETL pipeline works under the hood. This includes how we store metadata, how we generate the database schema, and what outputs we’re generating. The release notes contain more details on the code changes, so here I want to talk a little bit more about why, and where we are hopefully headed.

If you just want to download the new data release and start working with it, it’s up here on Zenodo. The same data for FERC 1 and EIA 860/923 can also be found in our Datasette instance at https://data.catalyst.coop

Categories
linkstream

Are electric utilities planning for climate change?

Oil and gas companies operating in the arctic and other areas impacted by climate change have been adapting their operations and infrastructure planning to the melting permafrost and other long-term impacts of their pyromania for decades, even while spreading disinformation about the same processes publicly. But are electric utilities doing the same kind of planning?

We’ve been thinking a bit about the ways in which the energy system in the US West is exposed to potential climate risks, in the context of long term utility resource adequacy and operational planning. We posted a short thread on Twitter and got some references from the #EnergyTwitter hive mind.