Luigi: A Workflow Tool That Doesn’t Ask for a Cluster
If your data jobs don’t need Kubernetes, maybe they just need this
So, what’s Luigi?
It’s Python code that knows what needs to run and when — without extra daemons, fancy dashboards, or endless YAML.
It was originally a Spotify thing. But it stuck around because it solves a boring, real problem: “I’ve got a bunch of jobs that depend on each other. I want them to run in the right order. I don’t want to write a Makefile. I definitely don’t want to build a microservice just for this.”
With Luigi, you just define your steps as Python classes. Say what inputs they need, what outputs they make, and how they run. That’s it. No black box. And when you run the pipeline? It skips what’s already done. Kind of like `make`, but smarter and less fragile.
Where It Shows Up
– Teams running ETL jobs on old servers with zero orchestration stack.
– ML folks who want to rerun parts of a pipeline without trashing everything else.
– SREs or analysts who have 12 steps to prep a report and keep doing it manually.
– Anyone who’s sick of duct-taping cron jobs together and pretending it’s “CI.”
Key Details (No Buzzwords)
What It Does | What That Looks Like in Real Use |
Python-native workflows | No DSL. No YAML. Just plain Python — each task is a class |
Dependency-aware runs | Tasks only run if their dependencies are satisfied |
Output-based checkpointing | Already-produced files = skipped tasks. No need to “mark done” manually |
CLI-driven execution | No UI needed. Just type and go |
Optional daemon mode | Run with central scheduler or locally — it’s your call |
Simple to debug | Logs in the terminal. Failures don’t disappear into a UI |
No DB by default | Uses output files to track state — great for minimal setups |
Retry logic built in | Define max retries, timeouts, and dependencies per task |
Respects your filesystem | Doesn’t require renaming everything or storing in magic folders |
Zero ceremony | No framework overhead. Just a pip install and code. |
What You Actually Need
– Python 3.7 or newer
– pip install luigi
– A place to save your output files (seriously, that’s it)
Write a task like this:
import luigi
class RawFile(luigi.Task):
def output(self):
return luigi.LocalTarget(“input.txt”)
def run(self):
with self.output().open(‘w’) as f:
f.write(“Some raw data”)
class ParsedFile(luigi.Task):
def requires(self):
return RawFile()
def output(self):
return luigi.LocalTarget(“parsed.txt”)
def run(self):
with self.input().open() as infile, self.output().open(‘w’) as outfile:
outfile.write(infile.read().upper())
Then run:
luigi –module job ParsedFile –local-scheduler
What Users End Up Saying
“Honestly, it’s boring — and I love that. It just works.”
“We didn’t need Airflow. We needed something that runs quietly at 3 a.m. and tells us what went wrong if it fails.”
“Luigi let us build a pipeline without any extra tooling. Python and logs. Done.”
But Keep This in Mind
Luigi doesn’t have a cool UI. It doesn’t have a company behind it selling support. It might even feel dated. But it’s stable. And if your workloads live in Python and run in steps — this is one of the very few tools that treats them with respect.
You don’t need Kubernetes to run a DAG. You might just need Luigi.