Category: linkstream

Data pipelines, stewardship and cleaning

Post author By Zane Selvans
Post date 2021-12-17
No Comments on Data pipelines, stewardship and cleaning

Chapter 4 of the 2018 National Climate Assessment looks at the potential climate impacts on the US energy system.
Flow of Flows — Orchestrating ELT with Prefect and dbt. More exploration of how to build data processing pipelines using open source tooling.
Orchestrating Airbyte data connection tasks with Prefect. Official integrations for Airbyte connectors as Prefect tasks.
Data cleaning IS analysis, not grunt work. A longish post exploring what we really get out of doing data cleaning, and why it’s more valuable and complex than it often gets credit for.
Peer learnings about what it means to become an open data steward, from the 2021 ODI Open Data Summit. Videos and responses from participants on many facets of stewarding open data, especially as a business / organization.

Tags airbyte, data cleaning, dbt, open data, pipeline, prefect, stewardship

Are electric utilities planning for climate change?

Post author By Zane Selvans
Post date 2021-12-07
No Comments on Are electric utilities planning for climate change?

Oil and gas companies operating in the arctic and other areas impacted by climate change have been adapting their operations and infrastructure planning to the melting permafrost and other long-term impacts of their pyromania for decades, even while spreading disinformation about the same processes publicly. But are electric utilities doing the same kind of planning?

We’ve been thinking a bit about the ways in which the energy system in the US West is exposed to potential climate risks, in the context of long term utility resource adequacy and operational planning. We posted a short thread on Twitter and got some references from the #EnergyTwitter hive mind.

Are utility planners, regulators & grid operators in the Western US appropriately accounting for the likelihood of extreme weather events in coming decades? What's the existing literature look like on this question? What data & analysis would best help answer this question?
— Catalyst Cooperative (@CatalystCoop) December 3, 2021

E.g. in the 2020s, 2030s, or 2040s, what are the chances that an extreme heatwave hits Las Vegas or Phoenix spiking demand for air conditioning, while hydroelectric generation or cooling water for thermal plants are also unavailable? And what would the consequences be?
— Catalyst Cooperative (@CatalystCoop) December 3, 2021

What would be the easiest way to make a compelling case to regulators / operators / legislators that this kind of forward-looking question should be addressed in IRPs & resource adequacy analyses, and identify the most vulnerable metro regions and states?
— Catalyst Cooperative (@CatalystCoop) December 3, 2021

Tags climate, energy, extreme weather, risk, water

linkstream weeknotes

SQL for data analysis, DGP, and pair programming

Post author By Zane Selvans
Post date 2021-06-06
No Comments on SQL for data analysis, DGP, and pair programming

Some good technical long reads from the last couple of weeks:

(Postgre)SQL for Data Analysis

Before the Tidyverse and Pandas, there was SQL. There’s still SQL, and as Vicki Boykis often points out: every data-centric framework that hangs around long enough tends toward SQL. It’s got almost half a century of careful thinking and optimization behind it. It seems entirely possible that it’ll still be around after another half century.

In this extensive post Haki Benita explores a bunch of data analysis that can be done directly with PostgreSQL in particular. It can be used either as an efficient preprocessing step before handing off to other tools, or to generate final products. It covers basic data selection, random selection, sampling, splitting data into training & testing sets, descriptive statistics, aggregations, regressions, interpolation, binning and much more. It’s almost more of a pocket guide to data analysis in SQL than a blog post.

Data (Error) Generation Processes

In this post Emily Riederer explores how conceptualizing data (and error!) generation processes can help you do better data validation. What does the data represent in the real world? How is it being collected? How does it move from where it’s collected to where it’s processed? What kinds of transformations operate on it before you look at the outputs? Understanding these steps and their contexts makes it easier to imagine how things can go wrong along the way and what errors to check for. It also makes it easier to debug errors when you find them.

On Pair Programming

A guide to pair programming from Birgitta Böckeler and Nina Siessegger. They look at both how and why to do it, and some of the challenges that it brings up. I had no idea that this has been a practice going back as far as the women who programmed ENIAC.

The authors explore several different styles of pair programming and the logistical planning required to make it work. They touch on the extra challenges of doing remote pairing which seems extra relevant these days. They cover productive and destructive social dynamics that come up, and a whole lot more. The article is long, but it’s definitely worth a read if you’ve thought about trying pair programming and been reluctant, or have tried it and been dissatisfied.

Tags data, data analysis, data validation, DGP, pair programming, postgresql