Celery and systemd: how to avoid a restart loop
TL;DR: If using Celery as a systemd service with just one worker, I would suggest using Type=simple and celery worker, rather than Type=forking and celery multi, to avoid a potential race condition where systemd repeatedly restarts Celery.
Method 1: Type=forking §
The official Celery systemd service example shows using a Type=forking service to launch Celery with celery multi:
[Unit]
Description=Celery Service
After=network.target
[Service]
Type=forking
User=celery
Group=celery
EnvironmentFile=/etc/conf.d/celery
WorkingDirectory=/opt/celery
ExecStart=/bin/sh -c '${CELERY_BIN} -A $CELERY_APP multi start $CELERYD_NODES \
--pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
--loglevel="${CELERYD_LOG_LEVEL}" $CELERYD_OPTS'
ExecStop=/bin/sh -c '${CELERY_BIN} multi stopwait $CELERYD_NODES \
--pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
--loglevel="${CELERYD_LOG_LEVEL}"'
ExecReload=/bin/sh -c '${CELERY_BIN} -A $CELERY_APP multi restart $CELERYD_NODES \
--pidfile=${CELERYD_PID_FILE} --logfile=${CELERYD_LOG_FILE} \
--loglevel="${CELERYD_LOG_LEVEL}" $CELERYD_OPTS'
Restart=always
[Install]
WantedBy=multi-user.targetCELERYD_NODES=my-worker
CELERYD_HOST=my-host
# NOTE: Celery will automatically generate the full hostname:
# my-worker@my-host
CELERYD_OPTS="--queues=my-queue,my-queue-2 --loglevel=debug --hostname=$CELERYD_HOST"However, as others have noted, celery multi and Type=forking do not always play well together:
celery multiis intended for managing multiple workers from the command line. According to Celery’s original developer, "[…] multi is not a service - it literally just starts n workers […]".- When using
Type=forkingwithoutPIDFile=, systemd must guess the main PID. If it guesses wrong, it may appear to systemd as if the service has stopped. - When using
Type=forkingwithPIDFile=to avoid the PID guessing, other issues arise: correctly handling the multiple PID files requires multiple levels of escaping, and even if that is managed correctly, the log file names will be populated incorrectly.
Fortunately, if you do not need to manage multiple workers, you can solve all these problems and also simplify your service file in the process:
- Use
celery workerrather thancelery multito launch a single worker. - Use
Type=simpleto avoid the PID guessing.
Method 2: Type=simple §
A full example follows:1
[Unit]
Description=Celery Service
After=network.target
[Service]
Type=simple
User=celery
Group=celery
EnvironmentFile=/etc/conf.d/celery
WorkingDirectory=/opt/celery
ExecStart=/bin/sh -c '${CELERY_BIN} worker ${CELERYD_OPTS} \
--app=$CELERY_APP --logfile=${CELERYD_LOG_FILE}'
Restart=always
[Install]
WantedBy=multi-user.target# NOTE: Renamed CELERY_NODES to CELERY_NODE to remind that this only works for
# running a single worker (node).
CELERYD_NODE=my-worker
CELERYD_HOST=my-host
# NOTE: When using "celery worker" rather than "celery multi", it is necessary
# to specify the full hostname. Otherwise Celery will default to using the
# node name "celery", giving you a hostname of "celery@my-host" instead of
# "my-worker@my-host".
CELERYD_OPTS="--queues=my-queue,my-queue-2 --loglevel=debug --hostname=$CELERYD_NODE@$CELERYD_HOST"We are able to simplify the ExecStart= command, though note that there are some subtle differences to how the hostname is handled. The ExecStop= and ExecReload= commands can be removed entirely, and instead we can rely on systemd’s default handling of those commands.
The same issue in other projects §
This issue is not specific to Celery. In theory, any process that tries to daemonize by double-forking under systemd without specifying PIDFile= could suffer from systemd guessing the wrong PID. Such an issue occurred in ypbind.
More details about double-forking §
When launching Celery with celery multi, it will perform a fairly standard double-fork to launch a daemon process (excerpt from celery/platforms.py):
def _detach(self):
if os.fork() == 0: # first child
os.setsid() # create new session
if os.fork() > 0: # pragma: no cover
# second child
os._exit(0)
else:
os._exit(0)
return selfDouble-forking is not necessary under systemd, and the documentation encourages using Type=notify, Type=notify-reload or Type=simple where possible:
Note that PID files should be avoided in modern projects. Use Type=notify, Type=notify-reload or Type=simple where possible, which does not require use of PID files to determine the main process of a service and avoids needless forking.
Some people have doubts about moving away from double-forking, while others insist such a move is long overdue:
Let the service management subsystem handle all this. Your program is already executing in a dæmon context when it starts running.
In my opinion, I’m inclined to agree with the view that systemd should be responsible for daemonizing the process. This simplifies the application, and removes the reliance upon guessing the PID. Further, using Type=notify or Type=notify-reload (for programs that make use of sd_notify), can give the program a reliable feedback/safety mechanism for communicating a successful start or need to reload. In Python this is easy to do with sdnotify, which I can recommend.
Backlinks §
- GitHub user rafalpietrzakio has linked back to this page from this Celery issue.
This closed PR suggested a similar change, though as we can see in the following example, it can be simplified further. The definition of
ExecStop=is not necessary, neither is the use of the--pidfileoption. Though it was a useful confirmation that others considered the same approach! ↩︎