Kubernetes CronJobs fail silently more often than you'd expect. Missed schedules don't page anyone. Exit code 0 doesn't mean the job did anything. DeadManCheck gives you an independent check that runs outside your cluster.
Kubernetes was not designed with cron job reliability as a first-class concern. Several well-documented behaviors combine to make silent failures the norm, not the exception:
startingDeadlineSeconds
window, the controller logs an error and permanently stops scheduling. No alert fires.
This is a real production failure mode that has caused multi-week data gaps.
successfulJobsHistoryLimit is 3 and
failedJobsHistoryLimit is 1. Older
job pods are deleted, taking their logs with them. When you notice something's wrong, evidence is already gone.
The simplest approach: wrap your job command to send a start ping, run the job, then send a
success or failure ping. No sidecar needed — just modify your entrypoint script or use a
command wrapper in the job spec.
Store the token in a Kubernetes Secret:
kubectl create secret generic deadmancheck-secret --from-literal=token=your-token
Kubernetes only knows whether your container exited 0 or non-zero. It has no concept of "this job completed but processed nothing."
DeadManCheck's output assertions close this gap. Pass a
?count= parameter with your ping and
set a minimum threshold. If your nightly export sends count=0,
you get paged — even though Kubernetes reports the Job as successful.
No other Kubernetes cron monitoring tool does this. See how output assertions work →
Free for 5 monitors. $12/mo for 100. Self-host for free.