Last update: 25-Mar-2022
Author: R. Koucha
Truncated core file for a process supervised by systemd
Introduction

When a process supervised by systemd crashes, sometimes its core file is truncated. We will explore here a reason why this happens.

Systemd's stop timeout

To stop a service named "app" controlled by systemd, one can run:

$ systemctl stop app.service

If the service defines ExecStop commands, the latter will be run to stop the service. If those commands are not configured, SIGTERM is sent to the application. In both cases, if the processes of the service don't finish before the timeout configured by TimeoutStopSec parameter, then a SIGKILL is sent to the processes to finish them. If TimeoutStopSec is not defined, the default value is configured in /etc/systemd/system.conf.
If upon receipt of SIGTERM the process crashes because its internal signal handler has a bug, the core dump generation can take several seconds if the running process occupies lots of memory. Hence, systemd's timeout may elapse during the production of the core file. If the kernel detects the reception of a signal during the core file generation, it stops immediately the process. Hence, a situation where the core file is truncated.

When we increase the trace level on the console, we can see the systemd messages pointing out the timeout between SIGTERM and SIGKILL:

# dmesg -n 7
# systemctl stop app.service
[  113.629618] systemd[1]: Stopping app service...
[  118.749840] systemd[1]: app.service: State 'stop-sigterm' timed out. Killing.
[  118.750552] systemd[1]: app.service: Killing process 2077 (app) with signal SIGKILL.
[  118.842771] systemd[1]: app.service: Main process exited, code=killed, status=9/KILL
[  118.844231] systemd[1]: app.service: Failed with result 'timeout'.
[  118.868077] systemd[1]: Stopped app service.
Core file generation in the Linux kernel

The core file is transferred to user space through several calls to dump_emit() (defined in fs/coredump.c):

/*
 * Core dumping helper functions.  These are the only things you should
 * do on a core-file: use only these functions to write out all the
 * necessary info.
 */

int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
{
      struct file *file = cprm->file;
      loff_t pos = file->f_pos;
      ssize_t n;
      if (cprm->written + nr > cprm->limit)
            return 0;

      while (nr) {
            if (dump_interrupted())
                  return 0;

            n = __kernel_write(file, addr, nr, &pos);
            if (n <= 0)
                  return 0;

            file->f_pos = pos;
            cprm->written += n;
            cprm->pos += n;
            nr -= n;
      }

      return 1;
}

EXPORT_SYMBOL(dump_emit);

The call to dump_interrupted() above checks for pending signals in the context of the current task. If a signal is pending, the core file generation is stopped:

static bool dump_interrupted(void)
{
     /*
      * SIGKILL or freezing() interrupt the coredumping. Perhaps we
      * can do try_to_freeze() and check __fatal_signal_pending(),
      * but then we need to teach dump_write() to restart and clear
      * TIF_SIGPENDING.
      */
      return signal_pending(current);
}
About the author

The author is an engineer in computer sciences located in France. He can be contacted here.