About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Thursday, October 31, 2024

SubDagOperator in Apache Airflow

The SubDagOperator in Apache Airflow allows you to create a DAG within another DAG, enabling you to manage and execute a group of tasks as a single unit. This can help organize complex workflows by breaking them down into smaller, reusable components.

How to Use SubDagOperator

To use the SubDagOperator, you need to define a sub-DAG function that specifies the tasks to be executed in the sub-DAG. Then, you can instantiate a SubDagOperator in your main DAG.

Example

Here's a basic example demonstrating how to use SubDagOperator in Airflow:

Main DAG: main_dag.py

python
from airflow import DAG from airflow.operators.dagrun_operator import DagRunOrder from airflow.operators.dummy_operator import DummyOperator from airflow.operators.subdag_operator import SubDagOperator from datetime import datetime # Define the sub-DAG function def create_subdag(parent_dag_name, child_dag_name, args): dag_subdag = DAG( dag_id=f"{parent_dag_name}.{child_dag_name}", default_args=args, schedule_interval='@daily', ) with dag_subdag: start = DummyOperator(task_id='start') process = DummyOperator(task_id='process') end = DummyOperator(task_id='end') start >> process >> end return dag_subdag # Define the main DAG default_args = { 'owner': 'airflow', 'start_date': datetime(2024, 10, 1), } with DAG('main_dag', default_args=default_args, schedule_interval='@daily') as main_dag: start_main = DummyOperator(task_id='start_main') # Create a SubDagOperator subdag_task = SubDagOperator( task_id='subdag_task', subdag=create_subdag('main_dag', 'subdag', default_args), dag=main_dag, ) end_main = DummyOperator(task_id='end_main') start_main >> subdag_task >> end_main

Explanation

  1. Sub-DAG Function:

    • The create_subdag function defines a sub-DAG. It takes the parent_dag_name, child_dag_name, and args as parameters.
    • Inside this function, we define the tasks for the sub-DAG (start, process, end) using DummyOperator.
  2. Main DAG:

    • The main DAG is defined with a DAG context.
    • A DummyOperator named start_main represents the start of the main DAG.
    • The SubDagOperator is instantiated, calling the create_subdag function to create the sub-DAG. The sub-DAG will execute the tasks defined in the create_subdag function.
    • A DummyOperator named end_main represents the end of the main DAG.
  3. Task Dependencies:

    • The main DAG starts with start_main, followed by the execution of the sub-DAG (subdag_task), and finally, it ends with end_main.

Notes

  • Scheduling: The sub-DAG will inherit the scheduling from the parent DAG, so you don’t need to specify the schedule_interval in the sub-DAG.
  • Complexity: Use SubDagOperator judiciously, as too many nested DAGs can make the workflow harder to manage and visualize.
  • Version: SubDagOperator is not recommended for high-concurrency scenarios because the sub-DAG tasks run in the same scheduler slot as the parent DAG.

This setup allows you to encapsulate and manage complex workflows effectively within a main DAG using the SubDagOperator.

Post Reference: Vikram Aristocratic Elfin Share

No comments:

Post a Comment