smhk

Celery and systemd: how to avoid a restart loop

TL;DR: If using Celery as a systemd service with just one worker, I would suggest using Type=simple and celery worker, rather than Type=forking and celery multi, to avoid a potential race condition where systemd repeatedly restarts Celery.

Method 1: Type=forking §

The official Celery systemd service example shows using a Type=forking service to launch Celery with celery multi:

/usr/lib/systemd/system/celery.service
[Unit]
Description=Celery Service
After=network.target

[Service]
Type=forking
User=celery
Group=celery
EnvironmentFile=/etc/conf.d/celery
WorkingDirectory=/opt/celery
ExecStart=/bin/sh -c '${CELERY_BIN} -A $CELERY_APP multi start $CELERYD_NODES \
    --pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
    --loglevel="${CELERYD_LOG_LEVEL}" $CELERYD_OPTS'
ExecStop=/bin/sh -c '${CELERY_BIN} multi stopwait $CELERYD_NODES \
    --pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
    --loglevel="${CELERYD_LOG_LEVEL}"'
ExecReload=/bin/sh -c '${CELERY_BIN} -A $CELERY_APP multi restart $CELERYD_NODES \
    --pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
    --loglevel="${CELERYD_LOG_LEVEL}" $CELERYD_OPTS'
Restart=always

[Install]
WantedBy=multi-user.target
/etc/config.d/celery
CELERYD_NODES=my-worker
CELERYD_HOST=my-host

# NOTE: Celery will automatically generate the full hostname:
# my-worker@my-host
CELERYD_OPTS="--queues=my-queue,my-queue-2 --loglevel=debug --hostname=$CELERYD_HOST"

However, as others have noted, celery multi and Type=forking do not always play well together:

Fortunately, if you do not need to manage multiple workers, you can solve all these problems and also simplify your service file in the process:

  • Use celery worker rather than celery multi to launch a single worker.
  • Use Type=simple to avoid the PID guessing.

Method 2: Type=simple §

A full example follows:1

/usr/lib/systemd/system/celery.service
[Unit]
Description=Celery Service
After=network.target

[Service]
Type=simple
User=celery
Group=celery
EnvironmentFile=/etc/conf.d/celery
WorkingDirectory=/opt/celery
ExecStart=/bin/sh -c '${CELERY_BIN} worker ${CELERYD_OPTS} \
    --app=$CELERY_APP --logfile=${CELERYD_LOG_FILE}'
Restart=always

[Install]
WantedBy=multi-user.target
/etc/config.d/celery
# NOTE: Renamed CELERY_NODES to CELERY_NODE to remind that this only works for
# running a single worker (node).
CELERYD_NODE=my-worker
CELERYD_HOST=my-host

# NOTE: When using "celery worker" rather than "celery multi", it is necessary
# to specify the full hostname. Otherwise Celery will default to using the
# node name "celery", giving you a hostname of "celery@my-host" instead of
# "my-worker@my-host".
CELERYD_OPTS="--queues=my-queue,my-queue-2 --loglevel=debug --hostname=$CELERYD_NODE@$CELERYD_HOST"

We are able to simplify the ExecStart= command, though note that there are some subtle differences to how the hostname is handled. The ExecStop= and ExecReload= commands can be removed entirely, and instead we can rely on systemd’s default handling of those commands.

The same issue in other projects §

This issue is not specific to Celery. In theory, any process that tries to daemonize by double-forking under systemd without specifying PIDFile= could suffer from systemd guessing the wrong PID. Such an issue occurred in ypbind.

More details about double-forking §

When launching Celery with celery multi, it will perform a fairly standard double-fork to launch a daemon process (excerpt from celery/platforms.py):

celery/platforms.py
    def _detach(self):
        if os.fork() == 0:  # first child
            os.setsid()  # create new session
            if os.fork() > 0:  # pragma: no cover
                # second child
                os._exit(0)
        else:
            os._exit(0)
        return self

Double-forking is not necessary under systemd, and the documentation encourages using Type=notify, Type=notify-reload or Type=simple where possible:

Note that PID files should be avoided in modern projects. Use Type=notify, Type=notify-reload or Type=simple where possible, which does not require use of PID files to determine the main process of a service and avoids needless forking.

Some people have doubts about moving away from double-forking, while others insist such a move is long overdue:

Let the service management subsystem handle all this. Your program is already executing in a dæmon context when it starts running.

In my opinion, I’m inclined to agree with the view that systemd should be responsible for daemonizing the process. This simplifies the application, and removes the reliance upon guessing the PID. Further, using Type=notify or Type=notify-reload (for programs that make use of sd_notify), can give the program a reliable feedback/safety mechanism for communicating a successful start or need to reload. In Python this is easy to do with sdnotify, which I can recommend.


  1. This closed PR suggested a similar change, though as we can see in the following example, it can be simplified further. The definition of ExecStop= is not necessary, neither is the use of the --pidfile option. Though it was a useful confirmation that others considered the same approach! ↩︎