Debug a data pipeline
Learn how to troubleshoot flow runs that fail.
In the Set up a platform for data pipelines tutorial, you used Prefect Cloud to set up a platform for data pipelines. In this tutorial, you’ll learn what to do when those data pipelines fail.
This tutorial starts where the previous tutorial leaves off, so complete that one first. You will need a paid Prefect Cloud account.
Find failures
You can use the Prefect Cloud dashboard to find failures.
- Sign in to Prefect Cloud
- Use the workspace switcher to open the
staging
workspace that you created in the last tutorial. - Go to Home, and look for red bars in the Flow Runs section, these indicate failed flow runs.
- Hover over a red bar to see more details about the flow run: name, deployment, duration, timestamp, and tags.
You can filter by a specific tag (e.g. team-a
) if you’re only interested in a specific set of flows.
Debug a failure
A single flow might experience failures on several runs. When this happens, it can be helpful to inspect the first failure in the series.
- In the Flow Runs section on the Home page, expand the
data-pipeline
flow. - You will see a list of failing
data-pipeline
flow runs, in reverse chronological order. - Use the pagination controls to navigate to the last failure in the list, this is the first failure that occurred.
- Click the name of the flow run to go to its detail page.
- From the flow run detail page, scroll down to the Logs section in the right panel.
- Look for an error message similar to the following:
It looks like there’s an error in the simulate_failures.py
file.
Now that you’ve found the failure, the next step is to fix the underlying code.
Update the code
Open the simulate_failures.py
file and look at line 12.
The if
statement is the problem.
If you specify the --fail_at_run
flag, once the flow runs more than fail_at_run
times, the flow fails with an exception.
Remove the if
statement to fix this failure.
We added this statement to give you something to fix. :)
Now, all flow runs succeed in spite of the --fail-at-run
flag.
Deploy the fix to the staging workspace to confirm this new behavior.
After the script finishes, open the Home page in Prefect Cloud to verify that the flow run is no longer failing.
You can now switch workspaces to update the code used in the production workspace as well.
Next steps
In this tutorial, you successfully used Prefect Cloud to fix a failing data pipeline. To learn more about the different states that can occur during the flow run lifecycle, see Manage states.
Next, learn how to alert your team when failures occur.
Need help? Book a meeting with a Prefect Product Advocate to get your questions answered.