Home
System Architecture overview

- A scheduler, which handles both triggering scheduled workflows, and submitting Tasks to the executor to run.
- An executor, which handles running tasks. In the default Airflow installation, this runs everything inside the scheduler, but most production-suitable executors actually push task execution out to workers.
- A webserver, which presents a handy user interface to inspect, trigger and debug the behaviour of DAGs and tasks.
- A folder of DAG files, read by the scheduler and executor (and any workers the executor has)
- A metadata database, used by the scheduler, executor and webserver to store state.
Scheduler
- Scheduler monitors all tasks and DAGs, then triggers the task instances once their dependencies are complete.
- Behind the scenes, the scheduler spins up a subprocess, which monitors and stays in sync with all DAGs in the specified DAG directory.
- Once per minute, by default, the scheduler collects DAG parsing results and checks whether any active tasks can be triggered.
- The Airflow scheduler is designed to run as a persistent service in an Airflow production environment.
- The scheduler is designed for high throughput. This is an informed design decision to achieve scheduling tasks as soon as possible. The scheduler checks how many free slots available in a pool and schedule at most that number of tasks instances in one iteration. This means that task priority will only come into effect when there are more scheduled tasks waiting than the queue slots. Thus there can be cases where low priority tasks will be scheduled before high priority tasks if they share the same batch. For more read about that you can reference this GitHub discussion.