Dagster supports data backfills for each partition or subsets of partitions. After defining a partitioned asset or job, you can launch a backfill which submits runs to fill in multiple partitions at the same time.
You can open the backfill modal to launch backfills for a partitioned asset using the "Materialize" button, either on the Asset detail page or when viewing a graph of assets. Backfills can also be launched for a selection of differently partitioned assets, as long as the roots share the same partitioning.
To observe the the progress of an asset backfill, you can visit the backfill details page. You can get to this page by clicking on the notification that shows up after launching a backfill. You can also reach this page by clicking on "Overview" in the top navigation pane, then clicking the "Backfills" tab, and then clicking on the ID of one of the backfills.
Backfill policies and single-run backfills (Experimental)#
By default, if you launch a backfill that covers N partitions, Dagster will launch N separate runs - one for each partition. This works well when your code is single threaded, because it avoids overwhelming it with large amounts of data. However, if you're using a parallel-processing engine like Spark and Snowflake, you often don't need Dagster to help with parallelism, so splitting up the backfill into multiple runs just adds extra overhead.
Dagster supports backfills that execute as a single run that covers a range of partitions. For example, this allows you to execute the backfill as a single Snowflake query. After it completes, Dagster will track that all those partitions have been filled.
Write your code so that it operates on partition ranges, instead of just single partitions. This means that, if your code uses the partition_key context property, you'll need to update it to instead use one of the partition_time_window, partition_key_range, or partition_keys properties. Which one to use depends on whether it's most convenient for you to operate on start / end datetime objects, start / end partition keys, or a list of partition keys.
If you are using an I/O manager to handle saving and loading your data, you'll need to ensure the I/O manager is also using these methods. If you're using any of the built-in database I/O managers, like Snowflake, BigQuery, or DuckDB, you'll have this out-of-the-box.
You can launch and monitor backfills of a job using the Partitions tab.
To launch a backfill, click the "Launch backfill" button at the top center of the Partitions tab. This opens the "Launch backfill" modal, which lets you select the set of partitions to launch the backfill over. A run will be launched for each partition.
You can click the button on the bottom right to submit the runs. What happens when you hit this button depends on your Run Coordinator. With the default run coordinator, the modal will exit after all runs have been launched. With the queued run coordinator, the modal will exit after all runs have been queued.
After all the runs have been submitted, you'll be returned to the partitions page, with a filter for runs inside the backfill. This refreshes periodically and allows you to see how the backfill is progressing. Boxes become green or red as steps in the backfill runs succeed or fail.