About Me

My photo
Mumbai, Maharastra, India
He has more than 7.6 years of experience in the software development. He has spent most of the times in web/desktop application development. He has sound knowledge in various database concepts. You can reach him at viki.keshari@gmail.com https://www.linkedin.com/in/vikrammahapatra/ https://twitter.com/VikramMahapatra http://www.facebook.com/viki.keshari

Search This Blog

Thursday, October 31, 2024

TimeSensor operator in Apache Airflow

The TimeSensor operator in Apache Airflow is used to pause a DAG's execution until a specified time of day. This is useful when you want a task to run only after a certain time, regardless of when the DAG starts. Once the specified time is reached, the TimeSensor allows the DAG to proceed to the next tasks.

Example: Using TimeSensor to Wait Until a Specific Time

Let’s say we want a DAG to start with a task that waits until 2:00 PM each day before running the actual data processing tasks. We’ll use TimeSensor to accomplish this.

python
from airflow import DAG from airflow.operators.python import PythonOperator from airflow.sensors.time_sensor import TimeSensor from datetime import datetime, time # Define the DAG with DAG( dag_id="time_sensor_example_dag", start_date=datetime(2023, 10, 1), schedule_interval="@daily", catchup=False, ) as dag: # Define the TimeSensor to wait until 2:00 PM wait_until_2pm = TimeSensor( task_id="wait_until_2pm", target_time=time(14, 0), # 2:00 PM in 24-hour format poke_interval=60 * 5, # Check every 5 minutes mode="reschedule" # Use 'reschedule' mode to free up resources while waiting ) # Define a data processing task def process_data(): print("Processing data after 2:00 PM") data_processing = PythonOperator( task_id="data_processing", python_callable=process_data ) # Set task dependencies wait_until_2pm >> data_processing

Explanation of the DAG

  1. TimeSensor:

    • The wait_until_2pm task is a TimeSensor that pauses the DAG until it’s 2:00 PM (14:00).
    • The target_time argument specifies the time to wait until. In this example, it’s set to time(14, 0), which is 2:00 PM.
    • The poke_interval is set to 5 minutes (300 seconds), meaning the sensor will check every 5 minutes if the current time has reached 2:00 PM.
    • The mode is set to "reschedule", which frees up worker slots while waiting, allowing other tasks to run.
  2. Data Processing Task:

    • The data_processing task runs only after the TimeSensor has allowed the DAG to proceed, ensuring it starts after 2:00 PM.
  3. Dependencies:

    • The DAG only proceeds to the data_processing task once the TimeSensor condition (2:00 PM) is met.

Benefits

  • Controlled Timing: TimeSensor ensures tasks don’t run until a specific time.
  • Efficient Resource Management: Using "reschedule" mode helps free up resources while waiting, reducing the load on the Airflow scheduler.
Post Reference: Vikram Aristocratic Elfin Share

No comments:

Post a Comment