Table Contracts
Every shared table and view in the pipeline, which phase produces it, and which phases consume it. This is the reference for understanding what breaks downstream when you modify a step.
Dependency Map
Setup Phase Outputs
| Object |
Type |
Produced By |
Consumed By |
eaglei_raw |
TABLE |
setup/02 |
setup/03 |
eaglei_part |
TABLE |
setup/03 |
correlation/02, correlation/06 |
counties_ref |
TABLE |
setup/04 |
correlation/01, correlation/03, looker/01 |
Correlation Phase Outputs
| Object |
Type |
Produced By |
Consumed By |
graph_multi_ingredients_hourly |
TABLE |
correlation/01 |
correlation/04, correlation/05, correlation/07, correlation/08 |
view_eaglei_6h_qc |
VIEW |
correlation/02 |
correlation/05 |
view_six_hour_grid |
VIEW |
correlation/03 |
correlation/05 |
view_windhail_thresholds |
VIEW |
correlation/04 |
correlation/07, correlation/08, correlation/09 |
view_outage_vs_wx_6h_qc |
VIEW |
correlation/05 |
correlation/09, correlation/10, ml/01, looker/01 |
events_restoration |
TABLE |
correlation/06 |
correlation/07 |
event_coverage_wx |
TABLE |
correlation/07 |
looker/01 |
view_daily_plan |
VIEW |
correlation/08 |
looker/01 |
lead_performance |
TABLE |
correlation/09 |
(end consumer) |
correlations |
TABLE |
correlation/10 |
looker/01 |
ML Phase Outputs
Which models are trained is controlled by ML_MODELS in config/.env. Each model produces its own set of output tables with a _{model_key} suffix. The table below shows outputs for all 4 model types.
Shared:
| Object |
Type |
Produced By |
Consumed By |
bqml_training_data |
TABLE |
ml/01 |
ml/02a-d, ml/03 |
Per model (created for each model in ML_MODELS):
| Object pattern |
Type |
Produced By |
Notes |
outage_predictor_{key} |
MODEL |
ml/02a-d |
One per model (regressor, classifier, logistic, automl) |
bqml_evaluation_{key} |
TABLE |
ml/03 |
Regression metrics (regressor) or classifier metrics (others) |
bqml_feature_importance_{key} |
TABLE |
ml/03 |
Feature importance via ML.GLOBAL_EXPLAIN |
bqml_predictions_{key} |
TABLE |
ml/03 |
Predicted ratio (regressor) or probability (classifiers) + tier |
bqml_threshold_sweep_{key} |
TABLE |
ml/03 |
Regressor only: precision/recall/F1 at cutoffs |
bqml_confusion_matrix_{key} |
TABLE |
ml/03 |
Classifiers only: TP/FP/FN/TN |
bqml_roc_curve_{key} |
TABLE |
ml/03 |
Classifiers only: ROC curve data |
Looker Phase Outputs
| Object |
Type |
Produced By |
Consumed By |
looker_timeseries_6h |
VIEW |
looker/01 |
(end consumer) |
looker_correlation |
VIEW |
looker/01 |
(end consumer) |
looker_risk_map |
VIEW |
looker/01 |
(end consumer) |
looker_preboard |
VIEW |
looker/01 |
(end consumer) |
looker_events |
VIEW |
looker/01 |
(end consumer) |
Column Schemas
graph_multi_ingredients_hourly
The primary weather feature table. Most expensive query to produce. All downstream analysis reads from this.
Partitioned by DATE(hour_ts), clustered by (county_fips, lead_hours).
| Column |
Type |
Description |
| county_fips |
STRING |
5-digit FIPS code (aliased from county_fips_code) |
| county_name |
STRING |
County display name |
| hour_ts |
TIMESTAMP |
6-hour block start (00/06/12/18 UTC) |
| lead_hours |
INT64 |
Forecast lead time in hours |
| ws10_max_mps |
FLOAT64 |
Max 10m wind speed (m/s) across grid cells in county |
| ws925_max_mps |
FLOAT64 |
Max 925hPa wind speed (m/s) |
| shear_0_6km_max_mps |
FLOAT64 |
Max 0-6km wind shear (m/s) |
| updraft700_pos_max_pas |
FLOAT64 |
Max positive 700hPa vertical velocity (Pa/s) |
| t700_c_min |
FLOAT64 |
Min 700hPa temperature (°C) |
| t700_c_mean |
FLOAT64 |
Mean 700hPa temperature (°C) |
| t850_c_mean |
FLOAT64 |
Mean 850hPa temperature (°C) |
| tp6_mm_max |
FLOAT64 |
Max 6-hour total precipitation (mm) |
| tp6_mm_mean |
FLOAT64 |
Mean 6-hour total precipitation (mm) |
view_outage_vs_wx_6h_qc
The master join view. Base for all downstream analysis — weather + outages + QC on the 6-hour grid scaffold.
| Column |
Type |
Description |
| county_fips |
STRING |
5-digit FIPS code |
| valid_ts |
TIMESTAMP |
6-hour block timestamp (from grid scaffold) |
| lead_hours |
INT64 |
Forecast lead time in hours |
| ws10_max_mps |
FLOAT64 |
Max 10m wind speed (m/s) |
| ws925_max_mps |
FLOAT64 |
Max 925hPa wind speed (m/s) |
| shear_0_6km_max_mps |
FLOAT64 |
Max 0-6km wind shear (m/s) |
| updraft700_pos_max_pas |
FLOAT64 |
Max positive 700hPa vertical velocity (Pa/s) |
| t700_c_min |
FLOAT64 |
Min 700hPa temperature (°C) |
| t700_c_mean |
FLOAT64 |
Mean 700hPa temperature (°C) |
| t850_c_mean |
FLOAT64 |
Mean 850hPa temperature (°C) |
| tp6_mm_mean |
FLOAT64 |
Mean 6-hour total precipitation (mm) |
| tp6_mm_max |
FLOAT64 |
Max 6-hour total precipitation (mm) |
| hail_flag |
INT64 |
1 if t700_c_min <= threshold AND tp6_mm_max >= threshold |
| outage_ratio_6h_max |
FLOAT64 |
Max outage ratio during 6h block |
| outage_ratio_6h_mean |
FLOAT64 |
Mean outage ratio during 6h block |
| samples_in_block |
INT64 |
Number of EAGLE-I 15-min samples in the block |
| customers_out_6h_max |
INT64 |
Max customers out during 6h block |
bqml_training_data
ML training features + labels derived from the master view.
| Column |
Type |
Description |
| county_fips |
STRING |
5-digit FIPS code |
| valid_ts |
TIMESTAMP |
6-hour block timestamp |
| ws10_max_mps |
FLOAT64 |
10m wind speed max (m/s) |
| ws925_max_mps |
FLOAT64 |
925hPa wind speed max (m/s) |
| shear_0_6km_max_mps |
FLOAT64 |
0-6km wind shear max (m/s) |
| updraft700_pos_max_pas |
FLOAT64 |
700hPa updraft max (Pa/s) |
| tp6_mm_max |
FLOAT64 |
6h precipitation max (mm) |
| t700_c_min |
FLOAT64 |
700hPa temperature min (°C) |
| t850_c_mean |
FLOAT64 |
850hPa temperature mean (°C) |
| hail_flag |
INT64 |
Binary hail indicator |
| lead_hours |
INT64 |
Forecast lead time in hours |
| hour_of_day |
INT64 |
Hour of day from valid_ts (0-23) |
| day_of_week |
INT64 |
Day of week from valid_ts (1=Sun, 7=Sat) |
| month |
INT64 |
Month from valid_ts (1-12) |
| wind_precip_interaction |
FLOAT64 |
Derived: ws10_max_mps * tp6_mm_max |
| wind_squared |
FLOAT64 |
Derived: ws10_max_mps^2 |
| shear_updraft_interaction |
FLOAT64 |
Derived: shear * updraft |
| outage_ratio_6h_max |
FLOAT64 |
Raw outage ratio (regression label) |
| outage_event |
INT64 |
Binary label: 1 if outage_ratio >= OUTAGE_THRESHOLD |
| data_split |
STRING |
'TRAIN' (80%) or 'TEST' (20%), deterministic by county + date |