Draft the primary rule of the ETL process¶
Success
Define the inputs, outputs, and wildcards for the rule that will run the ETL process once.
This example does not read any data from disk, so there is no input:. The output is a function of the location and time-period query parameters (i.e. wildcards).
workflow/rules/datasets/weather/open_meteo.smk
if (
"datasets" in config
and "weather" in config["datasets"]
and "open_meteo" in config["datasets"]["weather"]
):
validate(
config, WORKFLOW_BASE / "schemas/datasets/weather/config.schema.yaml"
)
rule datasets_weather_open_meteo_run:
"""
This rule will run the entire open_meteo workflow
to generate Convert weather data from the Open Meteo API to a Parquet file..
input:
No input files are required as the data is fetched from the Open Meteo
API directly.
output:
weather_data:
Path to the Parquet file containing weather data for the specified
latitude, longitude, and date range.
"""
output:
weather_data = (
Path(config["datasets"]["weather"]["data_dirs"]["raw"])
/ "{latitude}_{longitude}/{start_date}_{end_date}.parquet"
)
wildcard_constraints:
latitude = r"[-+]?\d{1,2}\.\d{1,6}", # Latitude in decimal degrees
longitude = r"[-+]?\d{1,3}\.\d{1,6}", # Longitude in decimal degrees
start_date = r"\d{4}-\d{2}-\d{2}", # Start date in YYYY-MM-DD format
end_date = r"\d{4}-\d{2}-\d{2}", # End date in YYYY-MM-DD format
conda:
config["CONDA"]["ENVS"]["RUNNER"]
script:
str(WORKFLOW_BASE / "scripts" / "rules_conda_RUNNER" / "able_weather_rules.py")