Last update: 17-Oct-2020
Author: R. Koucha
Why getpid() is a system call ?
Introduction

In my work place, they need to copy some memory mapped files into dump files. To make it, they use dd shell command. But it is not the most efficient way.

Dump of memory zones with dd command

The need is to be as efficient as possible to dump Huge Page (HP) based memory zones into some dump files. To make the dump we call a shell script using dd command to transfer data from HP memory mapped files to some dump files (with or without compression). The command line is something like:

dd if=/tmp/hpfs/memfile_01 bs=4096 count=512 > output_file

But dd is not the fastest tool to make the copy as it is based on numerous read(source file)/write(destination file) system calls (at least one per 4096 bytes when bs is set to 4096). For a 2 MB file, you have at least 2MB/4096 bytes = 512 read()/write() system calls. As a consequence, during the dump, you are continuously switching back and forth from user to kernel spaces. This sucks CPU time.

Proposed enhancements

1. Increase the size of the chunk during the copy. The bs parameter passed to dd is currently 4096. You can increase it to a bigger value as it is the chunk size passed to read()/write() system calls during the copy from the source to the destination file. For example, if you set it to 8192 instead of 4096, you would divide by 2 the number of read()/write() calls.

2. A more efficient way to make the job would be to use sendfile() or splice() system call which drastically reduces the number of system calls as it stays in kernel space to transfer the data from source to destination. For example, the synopsis of sendfile() is:

#include <sys/sendfile.h>

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

sendfile() copies count data from offset between in_fd and out_fd file descriptors. Because this copying is done within the kernel, it is more efficient than the combination of read() and write(), which would require transferring data to and from user space:

But, as it is not easy to translate a shell script into a C language executable, there is a version of dd which is based on the aforementioned sendfile()/splice(). It is named odd. This stands for "Optimized dd": https://github.com/stealth/odd.

The strace tool points out the system calls involved in the copy:

$ strace -f dd if=/tmp/hpfs/memfile_01 bs=4096 count=512 > output_file
[...]
 openat(AT_FDCWD, "/tmp/hpfs/memfile_01", O_RDONLY) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
lseek(0, 0, SEEK_CUR)                   = 0
read(0, "\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37"..., 4096) = 4096
write(1, "\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37"..., 4096) = 4096
read(0, "\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37"..., 4096) = 4096
write(1, "\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37"..., 4096) = 4096
read(0, "\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37"..., 4096) = 4096
write(1, "\0\1\2\3\4\5\6\7\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37"..., 4096) = 4096
... 512 calls to read() + 512 calls to write()...


$ strace -f ./odd if=/tmp/hpfs/memfile_01 bs=4096 count=512 send > output_file
[...]
openat(AT_FDCWD, "/tmp/hpfs/memfile_01", O_RDONLY|O_NOCTTY) = 3
openat(AT_FDCWD, "/dev/stdout", O_WRONLY|O_CREAT|O_NOCTTY|O_TRUNC|O_NOATIME, 0100755) = 4
lseek(3, 0, SEEK_SET)                   = 0
lseek(4, 0, SEEK_SET)                   = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
fadvise64(4, 0, 0, POSIX_FADV_DONTNEED) = 0
sendfile(4, 3, [0] => [4096], 4096)     = 4096
sendfile(4, 3, [4096] => [8192], 4096)  = 4096
sendfile(4, 3, [8192] => [12288], 4096) = 4096
... 512 calls to sendfile() ...
About the author

The author is an engineer in computer sciences located in France. He can be contacted here.