Kubernetes CronJob Monitoring

Kubernetes CronJobs fail silently more often than you'd expect. Missed schedules don't page anyone. Exit code 0 doesn't mean the job did anything. DeadManCheck gives you an independent check that runs outside your cluster.

Why Kubernetes CronJobs fail silently

Kubernetes was not designed with cron job reliability as a first-class concern. Several well-documented behaviors combine to make silent failures the norm, not the exception:

The 100 missed-schedule limit — if a CronJob misses more than 100 scheduled runs within the startingDeadlineSeconds window, the controller logs an error and permanently stops scheduling. No alert fires. This is a real production failure mode that has caused multi-week data gaps.
Job history purged, logs gone — by default, successfulJobsHistoryLimit is 3 and failedJobsHistoryLimit is 1. Older job pods are deleted, taking their logs with them. When you notice something's wrong, evidence is already gone.
Exit code 0 is not success — your container script can exit 0 after connecting to a stale database replica, processing an empty queue, or silently swallowing an exception. Kubernetes marks the Job successful.
Schedule slip under load — if the cluster is resource-constrained, the CronJob controller may delay job creation. No duration alerting. No anomaly detection.

Adding a DeadManCheck ping to a CronJob

The simplest approach: wrap your job command to send a start ping, run the job, then send a success or failure ping. No sidecar needed — just modify your entrypoint script or use a command wrapper in the job spec.

Shell wrapper (works with any container)

#!/bin/bash
set -euo pipefail

TOKEN="${DEADMANCHECK_TOKEN}"
BASE="https://deadmancheck.io/ping/${TOKEN}"

# Signal start
curl -fsS "${BASE}/start" > /dev/null

# Signal failure on any error
trap 'curl -fsS "${BASE}/fail" > /dev/null' ERR

# Run your actual job
ROWS=$(/app/run-export.sh)

# Signal success + row count for output assertion
curl -fsS -X POST -H "Content-Type: application/json" \
-d "{\"count\": ${ROWS}}" "${BASE}" > /dev/null

Python job with output assertion

import requests, os, sys

TOKEN = os.environ["DEADMANCHECK_TOKEN"]
BASE = f"https://deadmancheck.io/ping/{TOKEN}"

def main():
    requests.get(f"{BASE}/start", timeout=5)
    try:
        records_processed = run_job()
        requests.post(BASE, json={"count": records_processed}, timeout=5)
    except Exception as e:
        requests.get(f"{BASE}/fail", timeout=5)
        sys.exit(1)

if __name__ == "__main__":
    main()

Kubernetes CronJob spec with environment variable

apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-export
spec:
  schedule: "0 2 * * *"
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: exporter
            image: your-image:latest
            env:
            - name: DEADMANCHECK_TOKEN
              valueFrom:
                secretKeyRef:
                  name: deadmancheck-secret
                  key: token
          restartPolicy: OnFailure

Store the token in a Kubernetes Secret: kubectl create secret generic deadmancheck-secret --from-literal=token=your-token

Output assertions: beyond exit codes

Kubernetes only knows whether your container exited 0 or non-zero. It has no concept of "this job completed but processed nothing."

DeadManCheck's output assertions close this gap. POST a count with your ping and set a minimum threshold. If your nightly export sends count=0, you get paged — even though Kubernetes reports the Job as successful.

No other Kubernetes cron monitoring tool does this. See how output assertions work →

Free for 5 monitors. $29/mo, unlimited monitors. Self-host for free.