Data We Wrangle

US Energy Information Administration

Form 860

The data from Form 860 is the backbone of much of our database, as it provides the most extensive and well structured plant, utility, and generator level information. This includes data on electric utilities and non-utility generation at the plant and generator level. It also contains plant in-service and retirement dates, information about prime movers, generating capacity, energy sources, proposed generators, county and state location, ownership, and FERC qualifying facility status. In recent years the EIA 860 has also begun providing detailed information about renewable energy facilities.

Form 923

Form 923 focuses primarily on fuel-based thermal plants, and contains data on electricity generation, fuel consumption, useful thermal output, fossil fuel stocks, fuel deliveries, quantity delivered, supplier, coal mine type, fuel heat content, sulfur, and ash content, and receipts at the power plant and prime mover level. This data allows us to see where a plant’s or utility’s coal is coming from, how costs have changed over time, and to identify which generating units are responsible for a utility’s power output.

Form 861 (work in progress)

Form 861 mainly deals with what happens to electricity after its generated, and other aspects of the distribution level electricity system. This includes utility-level reporting on electricity sales, revenues, and customer counts, peak load, electric purchases, and energy efficiency and demand-side management programs, green pricing and net metering programs, and distributed generation capacity. Form 861 can be useful in a wide range of analyses, such as in evaluating the impact of generation changes on consumer rates or exploring the role of distributed generation and demand side management, particularly in more competitive electricity markets.

Form 176 (work in progress)

The EIA Form 176 describes the origins, suppliers, and disposition of natural gas on a state by state basis, including deliveries to different classes of end-users.

Thermoelectric Cooling Water (future)

Alongside the Form 923, EIA collects data on water consumed for thermoelectric generation. In many arid and semi-arid regions of the US water availability can become a limiting factor for thermoelectric generation during peak summer electricity usage, especially as climate change intensifies.

Federal Energy Regulatory Commission

Form 1

Fewer electricity producers are required to report data to the Federal Energy Regulatory Commission (FERC) than to EIA, but those who do, provide information via FERC Form 1 about their non-fuel production and non-production costs on a plant-by-plant basis. This data is key for estimating the marginal cost of electricity (MCOE) generated by a facility, allowing for economic comparisons to other generation and demand side management options. To our knowledge this is the only public reporting of non-fuel operating and maintenance expenses. FERC Form 1 also collects detailed annual accounts of utility plant in service. While this data is aggregated on a utility-wide basis, rather than at the plant level, it can still provide some insight into how a utility’s capital additions and retirements have affected its overall financial picture.

Form 714 (work in progress)

Form 714 contains balancing authority level generation, power purchase, transmission, and load statistics as well as data on hourly incremental energy pricing in the balancing area. This data provides a broad overview of operations not only within balancing areas but also between them, including, for example, actual and scheduled inter-balancing authority area power transfers. Planning area data includes summer and winter demand forecasts and actual hourly demand values for each planning area.

Form 2 (raw data)

FERC Form 2 describes the financial disposition of natural gas utilities countrywide. It primarily covers interstate transmission pipeline companies, since those are the ones under FERC’s jurisdiction. As the natural gas counterpart of the FERC Form 1 , it is vital for understanding the economics of transitioning away from natural gas, and developing appropriate financing and policy mechanisms to hasten that transition. It is distributed by FERC in the same inaccessible format as the FERC Form 1. We currently extract older DBF and newer XBRL data sources and convert them to SQLite databases for better accessibility.

Form 920 / Electric Quarterly Report (future)

The FERC EQR compiles the details of electricity contracts and transactions between utilities, merchant generators, and grid operators. The EQR is one of the most detailed publicly accessible electricity market data sets available. It includes high time resolution information about the economic value of renewable energy resources under a wide variety of market and environmental circumstances.

Other Agencies


Much of the electric utility data reported to the Environmental Protection Agency (EPA) is related to pollution, which is published in its most detailed form through the EPA’s Continuous Emissions Monitoring System (CEMS). The CEMS dataset contains information on hourly CO2 emissions as well as traditional air pollutants, collected under the EPA’s Continuous Emissions Monitoring System (CEMS). In addition, by publishing hourly generation loads and fuel heat content consumed, CEMS provides the most granular publicly available view of power plant operations that we are aware of. These data allows us to place quantitative constraints on power plant operational characteristics that are often considered proprietary.

PHMSA Pipelines (work in progress)

The Pipelines and Hazardous Materials Safety Administration (PHMSA) Pipelines dataset, which catalogs the age, diameter, length, material composition, and ownership of US natural gas gathering, transmission, and distribution pipelines on a state by state basis. These pipelines are major capital investments with lifetimes of 50+ years, and billions of dollars are already being spent each year to replace this aging infrastructure.

ISO/RTO LMP (future)

The information provided by grid operators (ISOs and RTOs) is some of the richest and most voluminous electricity system that’s publicly available. The locational marginal pricing (LMP) information is particularly valuable for assessing the economic viability of new and existing system investments. In combination with the EPA CEMS hourly operations data, EIA Form 923 fuel cost data, and estimates of non-fuel variable operating expenses from FERC Form 1, the ISO LMP data should allow us to model the profitability of individual generation units at hourly resolution. This information could be extremely valuable in a regulatory context, for comparing the economic viability of existing fossil plants to new demand side or renewable energy resources.

MSHA Mines (future)

The US Mining Safety and Health Administration provides information about mines all across the US, including mine production, employment, health & safety violations, and additional operational data. This data is available in its current format starting in the year 2000.