In embedded environments, the cost of the hardware is an important consideration. As a consequence, the memory is often very limited. The memory as well as the CPU time are critical resources which must be used with care and as efficiently as possible not only for response time and robustness purposes but also for hardware cost reduction purposes. Several applications need to call shell commands to trigger various tasks that would be tough to accomplish with languages like C. Hence, to make it, the C library provides the system() service which is passed as parameter the command line to run:
The "command" parameter may be a simple executable name or a more complex shell command line using output redirections and pipes.
system() hides a call to /bin/sh -c to run the command line passed as parameter. The following program is passed as argument a command line and calls system() to interpret and execute it:
#include <stdio.h> #include <stdlib.h> int main(int ac, char *av[]) { int status; if (ac != 2) { fprintf(stderr, "Usage: %s cmdline\n", av[0]); return 1; } printf("Running '%s'\n", av[1]); status = system(av[1]); if (0 != status) { fprintf(stderr, "system(%s) = %d\n", av[1], status); return 1; } return 0; } // main
If we compile and run the preceding program with the date command, we get the following result:
$ gcc sys.c -o sys $ ./sys date Running 'date' lundi 8 avril 2019, 10:01:36 (UTC+0200) $
If we trace the preceding command with strace tool, we see in red how system() behaves. It forks twice:
$ strace -f ./sys date [...] write(1, "Running 'date'\n", 15Running 'date' ) = 15 clone(strace: Process 4522 attached child_stack=NULL, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0x7fff2e3368bc) = 4522 [pid 4521] wait4(4522,[pid 4522] rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fcfaea0cf20}, NULL, 8) = 0 [pid 4522] rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fcfaea0cf20}, NULL, 8) = 0 [pid 4522] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 [pid 4522] execve("/bin/sh", ["sh", "-c", "date"], 0x7fff2e336c70 /* 65 vars */) = 0 [...] [pid 4522] clone(strace: Process 4523 attached child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f59b10b8810) = 4523 [pid 4522] wait4(-1,[pid 4523] execve("/bin/date", ["date"], 0x560cbeadc078 /* 65 vars */) = 0 [pid 4523] brk(NULL) = 0x563312d57000 [...] [pid 4523] write(1, "lundi 8 avril 2019, 10:04:15 (UT"..., 40lundi 8 avril 2019, 10:04:15 (UTC+0200) ) = 40 [pid 4523] close(1) = 0 [pid 4523] close(2) = 0 [pid 4523] exit_group(0) = ? [pid 4523] +++ exited with 0 +++ [pid 4522] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 4523 [pid 4522] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4523, si_uid=1000, si_status=0, si_utime=0, si_stime=0} --- [pid 4522] rt_sigreturn({mask=[]}) = 4523 [pid 4522] exit_group(0) = ? [pid 4522] +++ exited with 0 +++ <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 4522 rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fcfaea0cf20}, NULL, 8) = 0 rt_sigaction(SIGQUIT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fcfaea0cf20}, NULL, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=4522, si_uid=1000, si_status=0, si_utime=0, si_stime=0} --- exit_group(0) = ? +++ exited with 0 +++
So, from Linux system point of view, in the simplest case, system() triggers at least two pairs of fork()/exec() system calls: one for sh -c and another for the command line itself as depicted in Figure 1.
Moreover, fork() triggers a duplication of some resources (memory, file descriptors...) of the calling process (the father) to make the forked process (the child) inherit them. If the calling process is big from a memory occupation point of view or the overall memory occupation is high, the system() call may fail because of a lack of free memory. Even tough Linux benefited multiple enhancements like the Copy On Write (i.e. COW) to make the fork() more efficient and less cumbersome, this may lead to a memory over consumption which triggers Linux defense mechanisms like Out Of Memory (OOM) killer. An interesting study of the weaknesses of fork() is available in this report.
This paper aims at addressing the problem of system() overuse with some alternate solutions to enhance existing applications in a confident way that is to say with a minimal impact on the existing source code and its behaviour.
Linux inherited vfork() from BSD as an optimization of the application using fork() directly followed by a call to exec(). The idea was to avoid copying too much resources from the parent process when the child address space is immediately replaced by a new program. Compared to fork(), vfork() creates a new process without copying the page tables of the parent process. It is useful in performance-sensitive applications. As system() is merely a call to fork() immediately followed by a call to exec(), it is possible to rewrite it by replacing the call to fork() by vfork(). § A.1. shows a source code proposal for this solution. It is inspired by the original system() source code in the GLIBC library (.../sysdeps/posix/system.c).
Such a solution makes system() slightly more efficient: it is exactly the same behaviour except that vfork() is supposed to be more efficient than fork(). As a consequence, we save memory demand and CPU time.
It is possible to go farther by using a lightweight server which runs the command as described in the following paragraph.
It is possible to save memory consumption by making a dedicated lightweight alternate process run system().
Through the fork/exec mechanism, Linux permits a child process inherit the file descriptors of its father. This is widely used to make a process use the standard input (stdin) and outputs (stdout and stderr) of its father as depicted on Figure 2.
Figure 2: Process inheritance
The dup() system call is also a smart feature to duplicate open file descriptors inside a process. This is for example used to setup pipes between a father process and a child process: in the writer process, the input of the pipe is redirected to the standard outputs; in the reader process, the output of the pipe is redirected to the standard input of the process as depicted in Figure 3.
Figure 3: dup() system call to setup a pipe
The sharing or copying of open file descriptors as described above implies that the processes are in the same hierarchy (father/child inheritance).
In the AF_UNIX socket family (i.e. Sockets for the inter-process communication on the same machine), Linux provides the ability to send ancillary data using sendmsg() and recvmsg(). The ancillary data typed with SCM_RIGHTS transfer open file descriptors from one process to another. On receipt of those data, the destination process gets copies of the sent file descriptors as if it implicitly called dup() system call. The processes do not need to be in the same hierarchy, they merely need to be connected through a local socket to pass the open file descriptors from one process to another. Of course, the passed files descriptors get the first empty slots in the table of the destination process as dup() system call would do. For example, in Figure 4, the originator process sends its standard input and standard outputs (i.e. 0, 1 and 2 file descriptors) to the destination process. The latter already has a standard input and outputs. So, this will trigger copies of the arriving file descriptors in the first available slots: 3, 4 and 5.
Figure 4: Inter-process transfer of open file descriptors through a socket
So, this mechanism can be used in a client/server application where the server runs the system() service for the command line sent by the client. Along with the command line, the file descriptors for stdin, stdout and stderr of the client are passed in the ancillary data to make the server redirect the input and outputs to them. § A.2. shows an example of source code for this solution and Figure 5 depicts the principle.
Figure 5: Remote server through socket and ancillary data
Hence, we save memory consumption due to the fork() of the running application which may be heavy by making an alternate lightweight server running system().
It is possible to go farther by eliminating the step which consists to run and terminate a shell (i.e. sh -c) in order to save more CPU time as described in the following paragraph.
As some applications need to call system() frequently, it means that sh -c is run very often. Moreover, the execution and termination of multiple shells by several concurrent applications sucks CPU time and memory resources. It is possible to plan a solution where a shell is executed once and stays ready to use in any application needing to run commands.
The idea consists to start one (or more ?) background shell(s) at application startup. We don't use the "-c" option which runs one command line and then makes the shell exit. The shell must live in background during the application lifetime even after command execution. Each time the application needs to run a command, it submits it to the background shell. This saves the CPU time and memory needed to start and stop the shell. Figure 6 depicts the principle.
Without "-c" option, the shell is interactive. In other words, it needs to be in front of a terminal. Linux provides the pseudo-terminal (i.e. PTY) concept to manage this kind of needs. The PTY is setup between the application process (master side) and the background shell process (slave side). The latter believes that it is interacting with an operator through a real terminal whereas the operator is actually the application process: cf. Figure 7.
As the shell is in interactive mode, it displays a prompt to wait for a command. It gets the command, executes it and displays a new prompt at the end of the command to wait for another one. At first sight, the application process would need to do some tricky work to parse the displays from the shell in order to discriminate the command display from the displayed prompt at the end of the command. Moreover, the application must also get the result of the command (i.e. the exit status). To make it simple, it is possible to use PDIP (i.e. Programmed Dialogs with Interactive Programs). This is an open source. The package is fully documented with online manuals, html pages and examples. It is an expect-like tool but much more simple to use than its ancestor. It provides the ability to pilot interactive programs. It comes in two flavors: a command named pdip which is used to control interactive programs from a shell script and an C language API offered by a shared library called libpdip.so to control interactive programs from a C/C++ language program. The latter is interesting to implement the current solution.
In the source tree, the isys sub-directory contains a variant of system() using the above principle (cf. isys.c embedded in a shared library called libisys.so). § A.3. presents some details about this library. With libisys.so, the application process calls an API named isystem() which behaves the same as system() but actually it hides the PTY and the running background shell described above (cf. Figure 8).
The solution described in this chapter saves the fork()/exec() of sh -c by keeping at least one running background shell per application process. Depending on the application's behaviour, it may be useful to keep at least a running shell. But it may be cumbersome from a memory point of view if the application calls to isystem() are rare. It is possible to enhance this implementation to reduce the number of running background shells by sharing them with all the running applications as proposed in the following chapter.
To go farther in the preceding implementation, we propose to share running shells with all the application processes. The principle consists to setup a daemon process managing one or more background shells (static configuration or dynamic setting on demand for example). Let's call it rsystemd (i.e. rsystem daemon) to comply with Unix naming scheme. It is started before any application (at system startup for example) and waits for commands to run on a named socket. It submits the command to one of the shells that it manages and reports the result to the originating application processes. To make it, rsystemd relies on libpdip.so to interact with the shells as explained in § 4. On application process side, an API named rsystem() behaves the same as system() but actually it hides the interaction with rsystemd through the named socket: the command line passed as argument is written into the socket to make rsystemd run it and return the displays and the command status. The principle is depicted in Figure 9.
Figure 9: rsystemd
In the source tree of the PDIP package, the rsys sub-directory contains a variant of system() using the above principle (cf. rsystem.c embedded in a shared library called librsys.so which implements rsystem() API and rsystemd.c which implements the daemon part). § A.4. presents some details about this library. This proposal not only saves CPU time as we do not continuously fork()/exec() and terminate shell processes but it also saves memory space as the running shells are shared with several processes.
We must not forget that this solution differs from original system() service from a user interface point of view as the shells are running in separate processes which are not children of the application processes: they are childs of rsystemd. As a consequence, the father to child inheritance mechanism does not operate here (file descriptors, environment variables, signal disposition...). But most of the time it is not required by the users of system().
A command run by one process may trigger the change of the signal disposition or the umask or the current directory or anything in the remanent shell which may jeopardise the behaviour of the subsequent command. To prevent this, a pre/post command execution script could save/restore the shell context respectively before/after the command execution.
If rsystemd is designed with a fixed number of running background shells we may face some starvation problems as the shell command requests may not be satisfied immediately if their number is bigger than the running background shells. So, this introduces some possible latency. Moreover we may also face some deadlocks if there are dependencies between shell commands: a command waits for the setting of some resource by another command which can't get an available background shell. But if rsystemd is designed to launch brand new background shells to satisfy pending command requests when all its configured running background shells are busy, the latter problems won't occur.
On the security point of view, we may need to implement some sort of identification to make sure that a low privileged process will not be able to run shell commands in the privileged rsystemd's context (if the latter is launched in a more privileged mode). A solution consists to implement an authentication message through CMSG: Linux provides the ability to send ancillary data using sendmsg() and recvmsg(). The ancillary data typed with SCM_CREDENTIALS permits to send uid/gid in a secured way from one process to another. So, if the client process wants to run a command, it first sends its uid/gid (in fact, if the emitter process is running with super user rights, it can set any value for uid/gid but if it is not running as super user, it must set its real uid/gid). On server side (rsystemd), shells with different privilege levels are running. On receipt of the command request, it selects the one corresponding to the client's credentials.
Let's consider the following program to compare vsystem(), isystem(), rsystem() and system() performances. The program takes as parameters the number of iterations and the command line to run at each iteration. The current time is got before and after the iteration loop with clock_gettime(). We substract the second timestamp to the first and display the resulting duration.
#include <stdlib.h> #include <stdio.h> #include <errno.h> #include <string.h> #include <time.h> #include <isys.h> // if isystem() is used #include <rsys.h> // if rsystem() is used static void sub_ts(struct timespec *ts1, struct timespec *ts2, struct timespec *ts) { if ((ts1->tv_sec < ts2->tv_sec) || ((ts1->tv_sec == ts2->tv_sec) && (ts1->tv_nsec <= ts2->tv_nsec))) { /* TIME1 <= TIME2? */ ts->tv_sec = ts->tv_nsec = 0; } else { /* TIME1 > TIME2 */ ts->tv_sec = ts1->tv_sec - ts2->tv_sec ; if (ts1->tv_nsec < ts2->tv_nsec) { ts->tv_nsec = ts1->tv_nsec + 1000000000L - ts2->tv_nsec ; ts->tv_sec --; /* Borrow a second. */ } else { ts->tv_nsec = ts1->tv_nsec - ts2->tv_nsec; } } } // sub_ts int main(int ac, char *av[]) { int nb_cmd; int rc; int i; int status; char *cmd; size_t cmdsz, l; struct timespec ts, ts1, ts2; if (ac < 3) { fprintf(stderr, "Usage: %s iterations command [parameters...]\n", av[0]); return 1; } nb_cmd = atoi(av[1]); cmd = 0; cmdsz = 1; // Terminating NUL for (i = 2; i < ac; i ++) { l = strlen(av[i]); // Add a space if there are following parameters if (i < (ac - 1)) { l += 1; cmd = realloc(cmd, cmdsz + l); if (!cmd) { fprintf(stderr, "Iteration#%d: realloc(), '%m' (%d)\n", i, errno); rc = 1; goto err; } snprintf(cmd + cmdsz - 1, l + 1, "%s ", av[i]); } else { cmd = realloc(cmd, cmdsz + l); if (!cmd) { fprintf(stderr, "Iteration#%d: realloc(), '%m' (%d)\n", i, errno); rc = 1; goto err; } snprintf(cmd + cmdsz - 1, l + 1, "%s", av[i]); } cmdsz += l; } // End for printf("Running command '%s' %d times...\n", cmd, nb_cmd); rc = clock_gettime(CLOCK_REALTIME, &ts1); if (0 != rc) { fprintf(stderr, "clock_gettime(): '%m' (%d)\n", errno); rc = 1; goto err; } for (i = 0; i < nb_cmd; i ++) { status = system(cmd) or vsystem("%s", cmd) or isystem("%s", cmd) or rsystem("%s", cmd); if (status != 0) { fprintf(stderr, "Iteration#%d: status=%d\n", i, status); rc = 1; goto err; } } // End for rc = clock_gettime(CLOCK_REALTIME, &ts2); if (0 != rc) { fprintf(stderr, "clock_gettime(): '%m' (%d)\n", errno); rc = 1; goto err; } sub_ts(&ts2, &ts1, &ts); printf("Elapsed time: %ld s - %ld ns\n", (unsigned long)(ts.tv_sec), (unsigned long)(ts.tv_nsec)); rc = 0; err: if (cmd) { free(cmd); } return rc; } // main
We make four versions of the preceding program to call system() in the first, vsystem() in the second, isystem() in the third and rsystem() in the fourth (for the latter, we launch rsystemd daemon prior to running it). We write a very basic shell script file to be passed as argument to the four programs to run it:
#!/bin/sh date > /dev/null 2>&1 exit 0
The following table gives the results on a PC running a Linux Ubuntu distribution:
system() |
$ tests/system_it 2000 tests/script.sh Running command 'tests/script.sh' 2000 times... Elapsed time: 5 s - 524184688 ns |
vsystem() |
$ tests/vsystem_it 2000 tests/script.sh Running command 'tests/script.sh' 2000 times... Elapsed time: 5 s - 335097864 ns |
isystem() |
$ tests/isystem_it 2000 tests/script.sh Running command 'tests/script.sh' 2000 times... Elapsed time: 4 s - 165950047 ns |
rsystem() |
$ sudo rsys/src/rsystemd & $ tests/rsystem_it 2000 tests/script.sh Running command 'tests/script.sh' 2000 times... Elapsed time: 4 s - 131501380 ns |
We can see that vsystem() is slightly faster than system(). Both isystem() and rsystem() are faster than system(). As a consequence, they are good alternatives to system(). isystem() may sometimes appear to be faster than rsystem() because the latter involves a communication through a socket. But in real systems, where memory occupation is high, rsystem() may be preferred as it proposes a solution which saves memory and which is more efficient than system() anyway.
This paper presented various ideas to replace calls to system() by new simple APIs hiding mechanisms to save memory space and CPU time. The proposed solutions based on client/server interactions introduce some serialization of shell commands in order to reduce the memory and CPU requirements in situations where numerous shell commands come at the same time. This may contribute to the application robustness and efficiency. Especially at system startup time where several initialization shell scripts are typically triggered to setup the execution environment (server startup, log creation, network configuration, kernel modules loading...). This also contributes to hardware cost reduction by making CPU and memory needs reasonable.
#include <pthread.h> #include <stdlib.h> #include <stdio.h> #include <errno.h> #include <string.h> #include <time.h> #include <signal.h> #include <unistd.h> #include <sys/wait.h> #include <stdarg.h> #define SH_CMD_LEN 256 #define SH_NAME "sh" #define SH_PATH "/bin/" SH_NAME static int vsys_ref; static pthread_mutex_t vsys_mutex = PTHREAD_MUTEX_INITIALIZER; #define VSYS_LOCK() pthread_mutex_lock(&vsys_mutex) #define VSYS_UNLOCK() pthread_mutex_unlock(&vsys_mutex) int vsystem( const char *format, ... ) { int rc; char cmd[SH_CMD_LEN]; char *av[4]; int status; struct sigaction action, act_quit, act_int; sigset_t old_mask; int sig_is_blocked; va_list ap; // Build the command line to pass to "sh -c" // According to the manual, if the command is NULL, a non 0 value // is returned if a shell is available or 0 value is returned if a // shell is not available // For example a shell might not be available after a chroot() va_start(ap, format); rc = vsnprintf(cmd, sizeof(cmd), format, ap); va_end(ap); // Check if the command is not too long if (rc >= (int)sizeof(cmd)) { errno = ENOSPC; return -1; } // Make the parameters av[0] = SH_NAME; av[1] = "-c"; av[2] = cmd; av[3] = (char *)0; // According to the manual, the service must block SIGCHLD and ignore // SIGINT and SIGQUIT action.sa_handler = SIG_IGN; action.sa_flags = 0; sigemptyset(&(action.sa_mask)); // The reference counter ensures that we will not get in act_int and // act_quit, actions inherited from concurrent calls of systemX() which // would set IGNORE forever... VSYS_LOCK(); // If the signal are not already ignored if (0 == (vsys_ref ++)) { if (0 != sigaction(SIGINT, &action, &act_int)) { vsys_ref --; status = -1; goto out; } // End if sigaction !OK if (0 != sigaction(SIGQUIT, &action, &act_quit)) { vsys_ref --; status = -1; goto out_restore_sigint; } // End if sigaction !OK } // End if not already ignored (former ref_count != 0) VSYS_UNLOCK(); // For some reasons, GLIBC's system() blocks SIGCHLD outside of the critical // section... sigaddset(&(action.sa_mask), SIGCHLD); if (0 != pthread_sigmask(SIG_BLOCK, &(action.sa_mask), &old_mask)) { VSYS_LOCK(); if (0 == (-- vsys_ref)) { // Restore the former handler for SIGQUIT (void)sigaction(SIGQUIT, &act_quit, 0); // Restore the former handler for SIGINT (void)sigaction(SIGINT, &act_int, 0); } VSYS_UNLOCK(); return -1; } else { sig_is_blocked = 1; } // End if pthread_sigmask() failed // Use the light fork rc = vfork(); switch(rc) { case -1 : { // Error status = -1; } break; case 0 : { // Child process extern char **environ; // Restore the former handlers for SIGQUIT, SIGINT and signal mask (void)sigaction(SIGQUIT, &act_quit, 0); (void)sigaction(SIGINT, &act_int, 0); (void)pthread_sigmask(SIG_SETMASK, &old_mask, 0); (void)execve(SH_PATH, av, environ); _exit(127); } break; default: { // Father process pid_t pid; // Wait for the termination of the child pid = waitpid(rc, &status, 0); if (pid != rc) { status = -1; } } break; } // End switch VSYS_LOCK(); if (0 == (-- vsys_ref)) { out: // Restore the former handler for SIGQUIT if (sigaction(SIGQUIT, &act_quit, 0) != 0) { status = -1; } out_restore_sigint: // Restore the former handler for SIGINT if (sigaction(SIGINT, &act_int, 0) != 0) { status = -1; } } // End if ref_count is 0 // Restore the signal mask (For some reasons the GLIBC restores the // mask under the LOCK where as the blocking was done outside // the lock) if (sig_is_blocked) { if (pthread_sigmask(SIG_SETMASK, &old_mask, 0) != 0) { status = -1; } } VSYS_UNLOCK(); return status; } // vsystem
#define _GNU_SOURCE #include <errno.h> #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <sys/un.h> #include <sys/stat.h> #include <sys/select.h> #include <sys/wait.h> #include <stddef.h> #include <libgen.h> #include "srv_system.h" // // Message sent to the server // #define SRV_SYSTEM_CMDLINE_SZ 256 typedef struct { // Command line to execute char cmdline[SRV_SYSTEM_CMDLINE_SZ]; // stdin, stdout and stderr of the client int fds[3]; } srv_system_t; // // Open a local abstract socket in server or client mode // int srv_system_open( const char *name, int srv // 0 = client mode, !0 = server mode ) { int sd; size_t len; struct sockaddr_un addr; int rc; int err_sav; // Create a local socket sd = socket(PF_UNIX, SOCK_STREAM, 0); if (sd < 0) { return -1; } // If server mode if (srv) { int sockVal; // Set some options on the socket sockVal = 1; if (0 != setsockopt(sd, SOL_SOCKET, SO_REUSEADDR, (char *) &sockVal, sizeof (sockVal))) { err_sav = errno; close(sd); errno = err_sav; return -1; } memset(&addr, 0, sizeof(addr)); // Make an abstract socket according to "man 7 unix": An // abstract socket address is distinguished (from a pathname // socket) by the fact that sun_path[0] is a NUL byte ('\0'). // An abstract socket does not appear in the file system addr.sun_family = AF_UNIX; addr.sun_path[0] = '\0'; rc = snprintf(&(addr.sun_path[1]), sizeof(addr.sun_path) - 1, "%s", name); if (rc >= (int)sizeof(addr.sun_path)) { close(sd); errno = ENOSPC; return -1; } // Compute the length of the address. // According to man 7 unix, the addrlen argument that // describes the enclosing sockaddr_un structure should // have a value of at least: // // offsetof(struct sockaddr_un, sun_path)+strlen(addr.sun_path)+1 // // As for abstract socket, the 1st byte of sun_path is NUL, we // use the "strlen" of sun_path[1] to which we add the enclosing // NUL bytes. We make this for the fun as the manual also specifies // that this length could simply be sizeof(struct sockaddr_un). len = offsetof(struct sockaddr_un, sun_path) + 1 + rc + 1; rc = bind(sd, (struct sockaddr *)&addr, len); if (rc != 0) { err_sav = errno; close(sd); errno = err_sav; return -1; } // Update access rights on the socket file to make sure // that any user can connect to it (void)chmod(addr.sun_path, 0777); // Set the input connection queue length if (listen(sd, 5) == -1) { err_sav = errno; close(sd); errno = err_sav; return -1; } } else { // Client mode memset(&addr, 0, sizeof(addr)); addr.sun_family = AF_UNIX; rc = snprintf(&(addr.sun_path[1]), sizeof(addr.sun_path) - 1, "%s", name); if (rc >= (int)sizeof(addr.sun_path)) { close(sd); errno = ENOSPC; return -1; } len = offsetof(struct sockaddr_un, sun_path) + 1 + rc + 1; // Connect to the server rc = connect(sd, (const struct sockaddr *)&addr, len); if (0 != rc) { err_sav = errno; close(sd); errno = err_sav; return -1; } } return sd; } // srv_system_open // // Receive a message from the client // int srv_system_recv( int sd, srv_system_t *sys_msg ) { int rc; struct msghdr msg; struct cmsghdr *cmsg; char *cmsg_buf; size_t cmsg_buf_sz; size_t sz; int err_sav; struct iovec iov; char buf; unsigned int i; // Parameter checking if ((sd < 0) || !sys_msg) { errno = EINVAL; return -1; } // Size of the table of file descriptors sz = 3 * sizeof(int); // Invalidate the file descriptors for (i = 0; i < 3; i ++) { sys_msg->fds[i] = -1; } // End for cmsg_buf_sz = CMSG_SPACE(sz); cmsg_buf = (char *)malloc(cmsg_buf_sz); if (!cmsg_buf) { // Errno is set return -1; } memset(&msg, 0, sizeof(msg)); msg.msg_control = cmsg_buf; msg.msg_controllen = cmsg_buf_sz; // Ancillary data must accompany normal data // (it cannot be transmitted on its own) // Here, the data is the command line to run iov.iov_base = sys_msg->cmdline; iov.iov_len = sizeof(sys_msg->cmdline); msg.msg_iov = &iov; msg.msg_iovlen = 1; rc = recvmsg(sd, &msg, 0); if (-1 == rc) { err_sav = errno; free(cmsg_buf); errno = err_sav; return -1; } cmsg = CMSG_FIRSTHDR(&msg); if (cmsg) { if (SOL_SOCKET == cmsg->cmsg_level) { if (cmsg->cmsg_len != CMSG_LEN(sz)) { free(cmsg_buf); errno = EINVAL; return -1; } switch(cmsg->cmsg_type) { case SCM_RIGHTS: { // Table of file descriptors memcpy(sys_msg->fds, CMSG_DATA(cmsg), sz); } break; default: { // Unexpected control message type free(cmsg_buf); errno = EINVAL; return -1; } break; } // End switch } else { // Bad level free(cmsg_buf); errno = EINVAL; return -1; } } else { // No control message free(cmsg_buf); errno = EINVAL; return -1; } free(cmsg_buf); return 0; } // srv_system_recv // // Send a message to the server // int srv_system_send( int sd, srv_system_t *sys_msg ) { unsigned int i; int rc; struct msghdr msg; struct cmsghdr *cmsg; char *cmsg_buf; size_t cmsg_buf_sz; size_t sz; int err_sav; struct iovec iov; char buf; char *data; // Parameter checking if ((sd < 0) || !sys_msg) { errno = EINVAL; return -1; } // Size of the table of file descriptors sz = 3 * sizeof(int); data = (char *)(sys_msg->fds); cmsg_buf_sz = CMSG_SPACE(sz); cmsg_buf = (char *)malloc(cmsg_buf_sz); if (!cmsg_buf) { // Errno is set return -1; } memset(&msg, 0, sizeof(msg)); msg.msg_control = cmsg_buf; msg.msg_controllen = cmsg_buf_sz; // Ancillary data must accompany normal data // (it cannot be transmitted on its own) // Here, the data is the command line to run iov.iov_base = sys_msg->cmdline; iov.iov_len = sizeof(sys_msg->cmdline); msg.msg_iov = &iov; msg.msg_iovlen = 1; cmsg = CMSG_FIRSTHDR(&msg); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_RIGHTS; cmsg->cmsg_len = CMSG_LEN(sz); memcpy(CMSG_DATA(cmsg), data, sz); rc = sendmsg(sd, &msg, MSG_NOSIGNAL); if (-1 == rc) { err_sav = errno; free(cmsg_buf); errno = err_sav; return -1; } free(cmsg_buf); return 0; } // srv_system_send // // Engine of the server // int srv_system_server( const char *srv ) { int sd = -1; socklen_t laddr; int nfds; fd_set fdset; int sd1 = -1; int rc; struct sockaddr_un addr; int err_sav; sd = srv_system_open(srv, 1); if (sd < 0) { err_sav = errno; fprintf(stderr, "srv_system_open(%s): '%m' (%d)\n", srv, errno); goto err; } while (1) { // Wait for a connection FD_ZERO(&fdset); FD_SET(sd, &fdset); nfds = sd + 1; rc = select(nfds, &fdset, 0, 0, 0); switch(rc) { case -1: { err_sav = errno; fprintf(stderr, "select(%s): '%m' (%d)\n", srv, errno); goto err; } break; default: { // If it is a command to run if (FD_ISSET(sd, &fdset)) { srv_system_t sys_msg; pid_t pid; laddr = sizeof(addr); sd1 = accept(sd, (struct sockaddr *)&addr, &laddr); if (sd1 < 0) { err_sav = errno; fprintf(stderr, "accept(%s): '%m' (%d)\n", srv, errno); goto err; } // Receive the command to run rc = srv_system_recv(sd1, &sys_msg); if (rc != 0) { err_sav = errno; fprintf(stderr, "srv_system_recv(%s): '%m' (%d)\n", srv, errno); goto err; } // The reception of the message triggered the creation // of 3 opened file descriptors // Fork a child process to run the command pid = fork(); switch(pid) { case -1 : { // Error goto err; } break; case 0 : { // Child process char *av[4]; // Redirect stdin, stdout and stderr to the file // descriptors of the client process dup2(sys_msg.fds[0], 0); dup2(sys_msg.fds[1], 1); dup2(sys_msg.fds[2], 2); // As there are duplicated, close them close(sys_msg.fds[0]); close(sys_msg.fds[1]); close(sys_msg.fds[2]); // Execute a "sh -c cmdline" as system() would do av[0] = "/bin/sh"; av[1] = "-c"; av[2] = sys_msg.cmdline; av[3] = (char *)0; execv("/bin/sh", av); _exit(1); } break; default: { // Father int status; // The father don't need the client's file descriptors // as they are "passed" by the child close(sys_msg.fds[0]); close(sys_msg.fds[1]); close(sys_msg.fds[2]); // Wait for the end of the child and get its status rc = waitpid(pid, &status, 0); if (rc != pid) { goto err; } // Send the status to the client rc = write(sd1, &status, sizeof(status)); if (rc != (int)sizeof(status)) { fprintf(stderr, "Server '%s': unable to send status for command '%s'\n", srv, sys_msg.cmdline); } close(sd1); sd1 = -1; // In case of error, don't close it twice } break; } // End switch } } break; } // End switch } // End while err: if (sd >= 0) { close(sd); } if (sd1 >= 0) { close(sd1); } errno = err_sav; return -1; } // srv_system_server // // API which emulates system() // int srv_system( const char *srv, const char *cmdline ) { int sd; int rc; int fds[3] = { 0, 1, 2 }; int err_sav; int status; srv_system_t msg; // Connect to the server sd = srv_system_open(srv, 0); if (sd < 0) { fprintf(stderr, "srv_system_open(%s): '%m' (%d)\n", srv, errno); return -1; } // Send the NUL terminated command and file descriptors to the server rc = snprintf(msg.cmdline, sizeof(msg.cmdline), "%s", cmdline); if (rc >= sizeof(msg.cmdline)) { fprintf(stderr, "Command line too long: '%s' (max is %zu chars)\n", cmdline, sizeof(msg.cmdline) - 1); errno = ENOSPC; return -1; } msg.fds[0] = 0; msg.fds[1] = 1; msg.fds[2] = 2; rc = srv_system_send(sd, &msg); if (rc != 0) { err_sav = errno; fprintf(stderr, "srv_system_send(%s, %s): '%m' (%d)\n", srv, cmdline, errno); close(sd); errno = err_sav; return -1; } // The server will execute the command and redirect its // input/outputs to our stdin, stdout and stderr // Wait for the status of the command rc = read(sd, &status, sizeof(status)); if (rc != (int)sizeof(status)) { if (rc != -1) { err_sav = EIO; } else { err_sav = errno; } fprintf(stderr, "read(status) = %d: '%m' (%d)\n", rc, errno); close(sd); return -1; } // End the connection rc = close(sd); if (0 != rc) { err_sav = errno; fprintf(stderr, "close(%d, %s, %s): '%m' (%d)\n", sd, srv, cmdline, errno); errno = err_sav; return -1; } return status; } // srv_system int main( int ac, char *av[] ) { int rc; // If server mode if ((3 == ac) && !strcmp(av[1], "srv")) { rc = srv_system_server(av[2]); if (0 != rc) { return 1; } } else if ((3 == ac) && strcmp(av[1], "srv")) { // Client mode int status; printf("Connecting to '%s'\n", av[1]); status = srv_system(av[1], av[2]); printf("Command status is 0x%x\n", status); if (0 != status) { return 1; } } else { fprintf(stderr, "Usage: %s [srv server_socket_name |" " server_socket_name \"command line\"]\n", basename(av[0])); return 1; } return 0; } // main
$ gcc src_system.c -o srv_sys
$ ./srv_sys Usage: srv_sys [srv server_socket_name | server_socket_name "command line"] $ ./srv_sys srv /tmp/srv_sys
In another terminal:
$ ./srv_sys /tmp/srv_sys "ls -l /" Connecting to '/tmp/srv_sys' total 112 drwxr-xr-x 2 root root 4096 aout 12 22:16 bin drwxr-xr-x 4 root root 4096 aout 7 21:05 boot drwxrwxr-x 2 root root 4096 mars 23 2018 cdrom [...] dr-xr-xr-x 13 root root 0 dec. 10 11:15 sys drwxrwxrwt 14 root root 4096 dec. 10 12:41 tmp drwxr-xr-x 10 root root 4096 janv. 5 2018 usr drwxr-xr-x 15 root root 4096 nov. 21 09:13 var lrwxrwxrwx 1 root root 30 aout 7 21:00 vmlinuz -> boot/vmlinuz-4.13.0-46-generic lrwxrwxrwx 1 root root 30 juin 5 2018 vmlinuz.old -> boot/vmlinuz-4.13.0-43-generic Command status is 0x0 $
Unpack the source code package:
$ tar xvfz pdip-2.1.0.tgz
Go into the top level directory of the sources and trigger the build of the DEB packages:
$ cd pdip-2.1.0 $ ./pdip_install -P DEB
ISYS depends on PDIP. So, PDIP must be installed prior to install ISYS otherwise you get the following error:
$ sudo dpkg -i isys_2.1.0_amd64.deb Selecting previously unselected package isys. (Reading database ... 218983 files and directories currently installed.) Preparing to unpack isys_2.1.0_amd64.deb ... Unpacking isys (2.1.0) ... dpkg: dependency problems prevent configuration of isys: isys depends on pdip (>= 2.0.4); however: Package pdip is not installed. dpkg: error processing package isys (--install): dependency problems - leaving unconfigured Errors were encountered while processing: isys
Install first the PDIP package:
$ sudo dpkg -i pdip_2.1.0_amd64.deb Selecting previously unselected package pdip. (Reading database ... 218988 files and directories currently installed.) Preparing to unpack pdip_2.1.0_amd64.deb ... Unpacking pdip (2.1.0) ... Setting up pdip (2.1.0) ... Processing triggers for man-db (2.7.5-1) ...
Then install the ISYS package:
$ sudo dpkg -i isys_2.1.0_amd64.deb (Reading database ... 219040 files and directories currently installed.) Preparing to unpack isys_2.1.0_amd64.deb ... Unpacking isys (2.1.0) over (2.1.0) ... Setting up isys (2.1.0)
Installation from the packages is the preferred way as it is easy to get rid of the software with all the cleanups by calling:
$ sudo dpkg -r isys (Reading database ... 219043 files and directories currently installed.) Removing isys (2.1.0)
To display the list of files installed by the package:
$ dpkg -L isys /. /usr /usr/local /usr/local/include /usr/local/include/isys.h /usr/local/lib /usr/local/lib/libisys.so /usr/local/share /usr/local/share/man /usr/local/share/man/man3 /usr/local/share/man/man3/isys.3.gz /usr/local/share/man/man3/isystem.3.gz
It is also possible to trigger the installation from cmake:
$ tar xvfz pdip-2.1.0.tgz $ cd pdip-2.1.0 $ cmake . -- The C compiler identification is GNU 6.2.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Building PDIP version 2.1.0 The user id is 1000 -- Configuring done -- Generating done -- Build files have been written to: /home/rachid/DEVS/PDIP $ sudo make install Scanning dependencies of target man [ 2%] Building pdip_en.1.gz [ 4%] Building pdip_fr.1.gz [ 6%] Building pdip_configure. [...] -- Installing: /usr/local/lib/librsys.so -- Installing: /usr/local/sbin/rsystemd -- Set runtime path of "/usr/local/sbin/rsystemd" to ""
When the ISYS package is installed, on line manuals are available in section 3 (API).
$ man 3 isystem NAME isys - Interactive system() service SYNOPSIS #include "isys.h" int isystem(const char *fmt, ...); int isys_lib_initialize(void); DESCRIPTION The ISYS API provides a system(3)-like service based on a remanent background shell to save memory and CPU time in applications where system(3) is heavily used. isystem() executes the shell command line formatted with fmt. The behaviour of the format is compliant with printf(3). Internally, the command is run by a remanent shell created by the libisys.so library in a child of the current process. isys_lib_initialize() is to be called in child processes using the ISYS API. By default, ISYS API is deactivated upon fork(2). ENVIRONMENT VARIABLE The ISYS_TIMEOUT environment variable specifies the maximum time in seconds to wait for data from the shell (by default, it is 10 seconds). RETURN VALUE isystem() returns the status of the executed command line (i.e. the last executed command). The returned value is a "wait status" that can be examined using the macros described in waitpid(2) (i.e. WIFEXITED(), WEXITSTATUS(), and so on). isys_lib_initialize() returns 0 when there are no error or -1 upon error (errno is set). MUTUAL EXCLUSION The service does not support concurrent calls to isystem() by multiple threads. If this behaviour is needed, the application is responsible to manage the mutual exclusion on its side. EXAMPLE The following program receives a shell command as argument and executes it via a call to isystem(). #include |
To help people to auto-detect the location of ISYS stuff (libraries, include files...), the ISYS package installs a configuration file named isys.pc to make it available for pkg-config tool. Moreover, for cmake based packages, a FindIsys.cmake file is provided at the top level of isys sub-tree.
Unpack the source code package:
$ tar xvfz pdip-2.1.0.tgz
Go into the top level directory of the sources and trigger the build of the DEB packages:
$ cd pdip-2.1.0 $ ./pdip_install -P DEB
RSYS depends on PDIP. So, PDIP must be installed prior to install RSYS otherwise you get the following error:
$ sudo dpkg -i rsys_2.1.0_amd64.deb Selecting previously unselected package rsys. (Reading database ... 218983 files and directories currently installed.) Preparing to unpack rsys_2.1.0_amd64.deb ... Unpacking rsys (2.1.0) ... dpkg: dependency problems prevent configuration of rsys: rsys depends on pdip (>= 2.0.4); however: Package pdip is not installed. dpkg: error processing package rsys (--install): dependency problems - leaving unconfigured Errors were encountered while processing: rsys
Install first the PDIP package:
$ sudo dpkg -i pdip_2.1.0_amd64.deb Selecting previously unselected package pdip. (Reading database ... 218988 files and directories currently installed.) Preparing to unpack pdip_2.1.0_amd64.deb ... Unpacking pdip (2.1.0) ... Setting up pdip (2.1.0) ... Processing triggers for man-db (2.7.5-1) ...
Then install the RSYS package:
$ sudo dpkg -i rsys_2.1.0_amd64.deb (Reading database ... 219042 files and directories currently installed.) Preparing to unpack rsys_2.1.0_amd64.deb ... Unpacking rsys (2.1.0) over (2.1.0) ... Setting up rsys (2.1.0)
Installation from the packages is the preferred way as it is easy to get rid of the software with all the cleanups by calling:
$ sudo dpkg -r rsys (Reading database ... 219043 files and directories currently installed.) Removing rsys (2.1.0)
To display the list of files installed by the package:
$ dpkg -L rsys /. /usr /usr/local /usr/local/include /usr/local/include/rsys.h /usr/local/lib /usr/local/lib/librsys.so /usr/local/sbin /usr/local/sbin/rsystemd /usr/local/share /usr/local/share/man /usr/local/share/man/man3 /usr/local/share/man/man3/rsys.3.gz /usr/local/share/man/man3/rsystem.3.gz /usr/local/share/man/man8 /usr/local/share/man/man8/rsystemd.8.gz
It is also possible to trigger the installation from cmake:
$ tar xvfz pdip-2.1.0.tgz $ cd pdip-2.1.0 $ cmake . -- The C compiler identification is GNU 6.2.0 -- Check for working C compiler: /usr/bin/cc -- Check for working C compiler: /usr/bin/cc -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Building PDIP version 2.1.0 The user id is 1000 -- Configuring done -- Generating done -- Build files have been written to: /home/rachid/DEVS/PDIP $ sudo make install Scanning dependencies of target man [ 2%] Building pdip_en.1.gz [ 4%] Building pdip_fr.1.gz [ 6%] Building pdip_configure. [...] -- Installing: /usr/local/lib/librsys.so -- Installing: /usr/local/sbin/rsystemd -- Set runtime path of "/usr/local/sbin/rsystemd" to ""
When the RSYS package is installed, on line manuals are available in sections 3 (API) and 8 (rsystemd daemon).
$ man 3 rsystem NAME rsys - Remote system() service SYNOPSIS #include "rsys.h" int rsystem(const char *fmt, ...); int rsys_lib_initialize(void); DESCRIPTION The RSYS API provides a system(3)-like service based on shared remanent background shells managed by rsystemd(8) daemon. This saves memory and CPU time in applications where system(3) is heavily used. rsystem() executes the shell command line formatted with fmt. The behaviour of the format is compliant with printf(3). Internally, the command is run by one of the remanent shells managed by rsystemd(8). rsys_lib_initialize() is to be called in child processes using the RSYS API. By default, RSYS API is deactivated upon fork(2). ENVIRONMENT VARIABLE By default, the server socket pathname used for the client/server dialog is /var/run/rsys.socket. The RSYS_SOCKET_PATH environment variable is available to specify an alternate socket pathname if one needs to change it for access rights or any test purposes. RETURN VALUE rsystem() returns the status of the executed command line (i.e. the last executed command). The returned value is a "wait status" that can be examined using the macros described in waitpid(2) (i.e. WIFEXITED(), WEXITSTATUS(), and so on). rsys_lib_initialize() returns 0 when there are no error or -1 upon error (errno is set). MUTUAL EXCLUSION The service does not support concurrent calls to rsystem() by multiple threads. If this behaviour is needed, the application is responsible to manage the mutual exclusion on its side. EXAMPLE The following program receives a shell command as argument and executes it via a call to rsystem(). #include#include #include #include #include #include #include int main(int ac, char *av[]) { int status; int i; char *cmdline; size_t len; size_t offset; if (ac < 2) { fprintf(stderr, "Usage: %s cmd params...\n", basename(av[0])); return 1; } // Build the command line cmdline = 0; len = 1; // Terminating NUL offset = 0; for (i = 1; i < ac; i ++) { len += strlen(av[i]) + 1; // word + space cmdline = (char *)realloc(cmdline, len); assert(cmdline); offset += sprintf(cmdline + offset, "%s ", av[i]); } // End for printf("Running '%s'...\n", cmdline); status = rsystem(cmdline); if (status != 0) { fprintf(stderr, "Error from program (0x%x = %d)!\n", status, status); free(cmdline); return(1); } // End if free(cmdline); return(0); } // main Build the program: $ gcc trsys.c -o trsys -lrsys -lpdip -lpthread Make sure that rsystemd(8) is running. Then, run something like the following: $ ./trsys echo example Running 'echo example '... example AUTHOR Rachid Koucha SEE ALSO rsystemd(8), system(3).
$ man 8 rsystemd NAME rsystemd - Remote system() daemon SYNOPSIS rsystemd [-s shells] [-V] [-d level] [-D] [-h] DESCRIPTION rsystemd is a daemon which manages several childs processes running shells. It is a server for the rsystem(3) service. OPTIONS -s | --shells shell_list Shells to launch along with their CPU affinity. This may be overriden by the RSYSD_SHELLS environment variable. The content is a colon delimited list of affinities for shells to launch. An affinity is defined as follow: * A comma separated list of fields * A field is either a CPU number or an interval of consecutive CPU numbers described with the first and last CPU numbers separated by an hyphen. * An empty field implicitely means all the active CPUs * A CPU number is from 0 to the number of active CPUs minus 1 * If the first CPU number of an interval is empty, it is considered to be CPU number 0 * If the last CPU number of an interval is empty, it is considered to be the biggest active CPU number If a CPU number is bigger than the maximum active CPU number, it is implicitely translated into the maximum active CPU number. If this option is not specified, the default behaviour is one shell running on all available CPUs. -V | --version Display the daemon's version -D | --daemon Activate the daemon mode (the process detaches itself from the current terminal and becomes a child of init, process number 1). -d | --debug level Set the debug level. The higher the value, the more traces are displayed. -h | --help Display the help ENVIRONMENT VARIABLE By default, the server socket pathname used for the client/server dialog is /var/run/rsys.socket. The RSYS_SOCKET_PATH environment variable is available to specify an alternate socket pathname if one needs to change it for access rights or any test purposes. It is advised to specify an absolute pathname especially in daemon mode where the server changes its current directory to the root of the filesystem. Consequently, any relative pathname will be considered from the server's current directory. EXAMPLES The following launches a shell running on CPU number 3 and CPU numbers 6 to 8. We use sudo as rsystemd creates a named socket in /var/run. $ sudo rsystemd -s 3,6-8 The following launches three shells. The first runs on CPU numbers 0 to 3, CPU number 5 and CPU number 6. The second runs on CPU number 0 and CPU numbers 3 to the latest active CPU. The third runs on all the active CPUs. $ sudo rsystemd -s -3,5,6:0,3-: The following launches one shell through the RSYSD_SHELLS environment variable. We pass -E option to sudo to preserve the environment otherwise RSYSD_SHELLS would not be taken in account. The environment variable overrides the parameter passed to rsystemd. The affinity of the shell are CPU number 1 and 3. $ export RSYSD_SHELLS=1,3 $ sudo -E rsystemd -s -3,5,6:0,3-: AUTHOR Rachid Koucha SEE ALSO rsystem(3).
The finished state machine describing the main engine of rsystemd is depicted in Figure 10.
Figure 10: FSM of rsystemd
To help people to auto-detect the location of RSYS stuff (libraries, include file...), the RSYS package installs a configuration file named rsys.pc to make it available for pkg-config tool. Moreover, for cmake based packages, a FindRsys.cmake file is provided at the top level of rsys sub-tree.