21 April 2009

Boost.Process: Process management in C++

Boost.Process is a modern C++ library to ease process management. It allows developers to write platform-independent code to create, control and communicate with processes. As POSIX and Windows operating systems support different concepts apart from a minimum set of shared concepts Boost.Process provides generic and platform-specific classes to utilize those features available on the respective platforms. As of today Boost.Process is not yet complete though. It is a library under construction and has not become an official Boost C++ library yet. This article is an introduction into Boost.Process to explain developers how to get started.

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.

1. Introduction
2. Concepts
3. Accessing processes
4. Creating processes
5. Communicating with processes
6. Terminating processes
7. Waiting for termination
8. Managing pipelines
9. POSIX-only features
10. Windows-only features
11. Running test cases

1. Introduction

A short history of Boost.Process

The initial version of Boost.Process which is now known as version 0.1 was created by Julio M. Merino Vidal in 2006. It was created during the Google Summer of Code program - an initiative by Google to support and sponsor the development of open source libraries. Julio continued to work on Boost.Process afterwards but lost interest shortly before a new version was ready to be released. The snapshot of Boost.Process which he stopped working on in 2007 is known as version 0.2.

In 2008 several developers picked up Boost.Process and continued where Julio had left. As those efforts to improve the library were not organized there were several snapshots of Boost.Process floating around. End of 2008 I merged all the snapshots I was aware of, updated all test cases and made sure that the merged version of Boost.Process worked on all major platforms. The code was also checked into the sandbox which is the Boost repository for libraries under construction. This version is known as 0.3.

The latest version of Boost.Process is 0.31. It was released on 21 April 2009 together with this article. It fixes a few bugs and can also be found in the sandbox.

This article talks about the concepts and the architecture of Boost.Process. It explains how the library can be used and which POSIX and Windows features are currently supported. As Boost.Process is not yet finished it is also a request for comments to figure out what is missing before the library can be reviewed and eventually become an official Boost C++ library. Other developers are invited and encouraged to share their opinions and provide feedback!

The latest version of Boost.Process is available as a ZIP file. You'll find two directories in the ZIP file: boost contains header files, lib contains the documentation, samples and test cases. As Boost.Process is a header-only library you only need to copy the directory boost to whereever you keep your Boost C++ libraries. There is no need to build a library.

The documentation of Boost.Process can also be found online. Just like the C++ code it is not complete and needs to be overhauled. But then again you are currently reading an introduction into Boost.Process anyway and don't need the documentation. As the documentation contains a reference it might be useful later to lookup details though.

All classes and functions defined by Boost.Process belong to the namespace boost::process. While the classes and functions are defined in many different header files there is one header file which includes all the others. It is boost/process.hpp. I recommend to include this header file in your projects - that's what all samples in this article do, too.

2. Concepts

Generic and platform-specific classes

Creating a platform-independent C++ library for process management turns out to be rather difficult even before any code is written. The problem is that operating systems don't use necessarily the same concepts when it comes to process management. While all operating systems support processes POSIX systems for example know something called process groups and sessions. Windows again has concepts like window stations and desktops. Concentrating only on concepts which are supported by all platforms could be an option but isn't a good one. Boost.Process would support such a limited set of concepts that developers would probably turn rather quickly to other libraries which support the platform-specific features they are interested in.

Boost.Process' solution is to provide generic and platform-specific classes. Developers who want to restrict themselves to use only features supported by all platforms work with generic classes. Developers who want to benefit from certain features provided by their operating system work with platform-specific classes.

A good example is the class boost::process::process which not surprisingly represents a process. It is a generic class which provides a method get_id(). This method returns an identifier for a process. As process identifiers are supported on all operating systems get_id() has been added to the generic class boost::process::process.

If a process spawns child processes they are represented by the generic class boost::process::child. This class is inherited from boost::process::process as child processes are after all processes, too. While child processes can have a standard input, standard output and standard error stream on all platforms they can have arbitrary many streams on POSIX systems. That's why Boost.Process provides another class boost::process::posix_child which is derived from boost::process::child. On Windows again child processes have a few more identifiers, e.g. a primary thread handle. Developers who want to access those identifiers can use the class boost::process::win32_child on Windows.

After having been introduced to boost::process::process, boost::process::child, boost::process::posix_child and boost::process::win32_child you know already two very important concepts of Boost.Process. Another important concept is implemented by the generic class boost::process::context and the platform-specific classes boost::process::posix_context and boost::process::win32_context.

A context is a description of the environment of a process. It can affect both the startup procedure of a process and its runtime behavior. This description for example includes the working directory and environment variables. It also specifies how the standard input, standard output and standard error streams should be treated when a process is started, e.g. if they should be closed, inherited or redirected.

Boost.Process defines a few more concepts where it's questionable though if their definitions make sense. There is for example a concept for the executable name. As the executable name is a string it's not sure if there is any benefit if a concept is defined for the executable name. The original idea was to allow developers to use different string types. The current version of Boost.Process requires though to pass the executable name as a std::string. Neither std::wstring nor other string types like CString from the Microsoft Foundation Classes will work. As other Boost C++ libraries like Boost.Interprocess stick to one and only one string type - typically std::string - Boost.Process should probably do the same.

3. Accessing processes

Detecting and examining other processes

On today's computers there can be typically many processes found. For example a user might work with a spreadsheet application while listening to music played by a media player. At the same time the computer might be protected by an anti-virus program running in the background. It would be nice if a library for process management offered a way to detect processes and access them somehow.

As of today Boost.Process provides a way to access only one process: The very own process. It's neither possible to find out how many processes there are nor what their identifiers are for example. There is only a class boost::process::self which is used to access the very own process.

The class boost::process::self is a singleton and can't be instantiated. A static method get_instance() must be called which returns an object of type boost::process::self. This object represents the current process and can be used to access environment variables.

#include <boost/process.hpp> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  self &s = self::get_instance(); 
  environment env = s.get_environment(); 
  for (environment::iterator it = env.begin(); it != env.end(); ++it) 
    std::cout << it->first << "=" << it->second << std::endl; 
}

Download source code

The method get_environment() returns an object of type boost::process::environment which is a typedef for std::map<std::string, std::string>. The environment variables can then be processed using the well-known methods of std::map. In the sample code above they are written to the standard output stream.

As the environment variables returned by get_environment() are stored in an object of type boost::process::environment which is only a typedef for std::map adding, removing or changing values in the map does not affect the process' environment variables. Any changes to the process' environment variables for example by calling setenv() on POSIX systems and SetEnvironmentVariable() on Windows systems are not reflected in the map either. An object of type boost::process::environment is a snapshot of a process' environment variables - nothing more.

It could make sense to use another container type than std::map, e.g. std::unordered_map. On Windows systems however the environment variables are always sorted. As boost::process::environment will be used later to pass environment variables to a new process std::map makes sure that they really are sorted. As sorting is not required on POSIX systems though either two different definitions could be used or - if this is too confusing - developers could define a macro to make Boost.Process use two different definitions. By default, Boost.Process would stick to std::map on all platforms though.

As Boost.Process doesn't provide classes or functions to detect other processes developers need to use platform-specific functions. Windows for example provides CreateToolhelp32Snapshot() which can be used to iterate over processes. Each process is described by a PROCESSENTRY32 structure which contains among others a process identifier. The process identifier could be used to instantiate boost::process::process as its constructor accepts a process identifier as a sole argument.

#include <windows.h> 
#include <tlhelp32.h> 
#include <boost/process.hpp> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  HANDLE handle = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0); 
  if (handle == INVALID_HANDLE_VALUE) 
    return 1; 

  PROCESSENTRY32 entry; 
  entry.dwSize = sizeof(entry); 
  if (!Process32First(handle, &entry)) 
    return 2; 

  do 
  { 
    process p(entry.th32ProcessID); 
    std::cout << p.get_id() << std::endl; 
  } 
  while (Process32Next(handle, &entry)); 

  CloseHandle(handle); 
}

Download source code

While boost::process::process can be initialized with th32ProcessID it doesn't really make much sense. There is more information available in a PROCESSENTRY32 structure which would be lost if a program continues to work with boost::process::process only. Boost.Process should better provide a platform-independent function to detect and iterate over processes.

4. Creating processes

Launching child processes

While Boost.Process doesn't really offer much when it comes to accessing other processes it is much better in creating processes. There are three functions provided which are used to launch child processes:

boost::process::launch() expects an executable name (as std::string), arguments to pass to the executable (as std::vector<std::string>) and a context the executable should be started with and run in (as boost::process::context).
boost::process::launch_shell() is used if a string contains the executable name and arguments. Instead of separating the executable name and arguments they are passed as one std::string. boost::process::launch_shell() also expects a context as another parameter (again as boost::process::context).
As it is possible to start several processes and redirect their input and output streams the function boost::process::launch_pipeline() is provided to start and connect two or more processes with pipes. This function expects a collection of entries which describe the processes.

All three functions are really templates - after all Boost.Process tries to support concepts which can be implemented as different types. The idea is to pass for example a string of another type to boost::process::launch() to specify the executable name. In practice it won't be easy to use other types though. The current implementation of boost::process::launch() for example invokes size() and c_str(). If the string type doesn't provide these methods compilation will fail. Boost.Process either has to specify precisely the concept of an executable name or give it up. Developers shouldn't be required to study the implementation of Boost.Process. As the implementation can also change any time code might break which worked before.

While C++ standard types are used to specify the executable name and arguments the context parameter is based on a class boost::process::context. The class deserves some more explanation.

An object of type boost::process::context contains data which describes how an executable is started and what its runtime environment will look like. The class boost::process::context defines several public properties:

work_directory is the work directory the executable is started in. By default, the work directory is set to the current directory.
environment contains the environment variables for the new process. By default, no environment variables are defined. For executables linked against shared libraries it is typically required to define at least the environment variable PATH. Otherwise locating and loading required shared libraries fails.
stdin_behavior specifies the behavior of the standard input stream.
stdout_behavior specifies the behavior of the standard output stream.
stderr_behavior specifies the behavior of the standard error stream.

While work directory and environment variables are well-known concepts supported by all operating systems Boost.Process provides three settings to specify the behavior of the standard streams. By default, when a new process is created all standard streams are closed. Thus the new process can't read data from stdin and can't write data to stdout and stderr.

Please note that this behavior is different from the default behavior on POSIX systems. When fork() is used to create a new process the child process inherits all file descriptors from the parent process. The standard streams are not automatically closed.

On Windows when CreateProcess() is used the behavior of the standard streams is either explicitly specified by the parent process or the streams are automatically bound to the keyboard buffer and the console's window buffer. There is no default behavior.

If a program doesn't use the standard streams - e.g. because it is an application with a graphical user interface - the stream behaviors can be ignored. This makes it rather easy to launch such a program.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("notepad.exe"); 
  std::vector<std::string> args = boost::assign::list_of("notepad.exe"); 
  context ctx; 
  ctx.environment = self::get_environment(); 
  launch(exec, args, ctx); 
}

Download source code

In the code sample above the standard Windows editor Notepad is launched. The executable is called notepad.exe. Now it is very important to refer to the executable with an absolute path - on all platforms including POSIX systems. The absolute path for Notepad is typically C:\Windows\System32\notepad.exe. Boost.Process provides a helper function boost::process::find_executable_in_path() though which searches for an executable and generates an absolute path. boost::process::find_executable_in_path() searches the directories of the environment variable PATH. If PATH hasn't been set or the executable isn't found in the directories PATH is set to boost::process::find_executable_in_path() throws boost::filesystem::filesystem_error.

Another important requirement is to always pass at least one argument to the executable. This must be the name of the executable itself. Whether it contains an extension like exe doesn't really matter - not to Boost.Process at least. You can also pass another string than the executable name. The string is forwarded to the program. So it really depends on the program if and how it is processed. For Boost.Process it's only important that at least one argument has been set.

By the way, there is another helper function boost::process::executable_to_progname() which extracts the executable name from an absolute (or relative) path. For example it returns notepad for C:\Windows\System32\notepad.exe. The helper function makes it easy to specify the first argument. The current implementation of boost::process::executable_to_progname() always strips the file extension on Windows though.

There are no rules when it comes to specifying the context - anything goes. However many applications are linked against shared libraries which must be located and loaded by the operating system. This is also true for Notepad which depends on DLLs in C:\Windows\System32 among others. That's why it is often a good idea to set the environment variable PATH at least. If all environment variables of the current process should be used boost::process::self provides a static method get_environment() which comes handy here.

For programs which make use of standard streams it can be very important to configure the context correctly. All properties stdin_behavior, stdout_behavior and stderr_behavior are based on boost::process::stream_behavior. With the following helper functions instances of this class can be created and assigned:

boost::process::capture_stream() is used to make the other end of a standard stream available in the current process. It is then possible for the current and the new process to communicate with each other and exchange data. On POSIX and Windows systems this connection is known as pipe.
boost::process::close_stream() is used to close a stream. As this is the default behavior this function doesn't need to be used really.
boost::process::inherit_stream() makes the child process inherit the standard streams of the current process. No matter if the child or current process writes data for example to the standard output stream - it is transmitted to the very same sink.
boost::process::redirect_stream_to_stdout() can be used to redirect the standard error stream to the standard output stream. It's a helper function which is not very important but might be useful for some developers.
boost::process::silence_stream() redirects the standard input stream to a source random data is read from and the standard output and standard error streams to a sink which discards data written to. You use this function if you don't care about data read from stdin and written to stdout and stderr but don't want to close the standard streams. This makes sense if you know a program expects to be able to use the standard streams and would break otherwise if they are closed.

When an object of type boost::process::context is passed to a function like boost::process::launch() the standard streams of the new process are configured according to the behavior settings. If a stream is closed - the default behavior - nothing happens. If a stream is silenced it is redirect to /dev/null respectively /dev/zero on POSIX systems and redirected to NUL on Windows systems.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("hostname"); 
  std::vector<std::string> args = boost::assign::list_of("hostname"); 
  context ctx; 
  ctx.environment = self::get_environment(); 
  ctx.stdout_behavior = inherit_stream(); 
  launch(exec, args, ctx); 
}

Download source code

The sample code above uses an inherited standard output stream. It starts a small utility which is available on POSIX and Windows systems and writes the hostname to the standard output stream. As the child process inherits the standard output stream the hostname is written to the terminal window when the program is run. There are no further steps required to make the output of the child process visible.

By the way, the sample code above also shows that boost::process::find_executable_in_path() can find a program on Windows even if no file extension is given. It's not required to pass "hostname.exe" to boost::process::find_executable_in_path().

A more interesting and also more complex behavior is a captured stream. If for example the standard input stream of a new process is captured the current process can write data which will be received by the child process through its standard input stream. The question how the current process accesses and uses the captured stream is answered in the next section.

5. Communicating with processes

Exchanging data - even asynchronously

Boost.Process enables processes to communicate with each other only if they are related: One process must have created the other. If you want two processes which are not related to exchange data you should use Boost.Interprocess.

The functions boost::process::launch() and boost::process::launch_shell() which are used to spawn a child process return an object of type boost::process::child. This class is derived from boost::process::process. It defines three methods get_stdin(), get_stdout() and get_stderr(). While get_stdin() returns an object of type boost::process::postream the methods get_stdout() and get_stderr() return an object of type boost::process::pistream.

These objects of type boost::process::postream and boost::process::pistream represent standard input and standard output streams of another process. It is important to understand that boost::process::postream is an input stream and boost::process::pistream an output stream of another process. That's why get_stdin() returns an object of type boost::process::postream. From the view of the current process the standard input stream of the other process is an output stream though. The current process has to write data to an output stream which is then received through the standard input stream by the other process.

It should also be noted that objects of type boost::process::postream and boost::process::pistream can only be used if the stream behaviors have been set correctly: Streams must be captured with boost::process::capture_stream().

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("hostname"); 
  std::vector<std::string> args = boost::assign::list_of("hostname"); 
  context ctx; 
  ctx.environment = self::get_environment(); 
  ctx.stdout_behavior = capture_stream(); 
  child c = launch(exec, args, ctx); 
  pistream &is = c.get_stdout(); 
  std::cout << is.rdbuf(); 
}

Download source code

The classes boost::process::postream and boost::process::pistream behave like other streams from the C++ standard. In fact boost::process::postream is derived from std::ostream and boost::process::pistream from std::istream. The only additional methods defined by boost::process::postream and boost::process::pistream are handle() and close(). While close() obviously closes the stream handle() returns an object of type boost::process::detail::file_handle.

boost::process::detail::file_handle implements another concept of Boost.Process called handle. It hasn't been introduced yet in this article as it's not clear if this concept is really useful. As you can see this class is defined within the detail namespace of boost::process. Classes defined in this namespace are not really meant to be used by other developers.

The method handle() was added in Boost.Process 0.3. It is the only way to get the underlying native handle of a stream. On POSIX systems this handle is called file descriptor while on Windows systems it's also called handle. Getting the native handle is required for asynchronous operations.

Boost.Asio is the Boost C++ library to support asynchronous input/output operations. It defines I/O objects which represent devices to write and read data asynchronously. There is for example a socket I/O object to transmit data over a network. There is also a generic I/O object which can be initialized with a handle. As Boost.Process provides the method handle() to get a handle we can use the generic I/O object of Boost.Asio for asynchronous read and write operations.

#include <boost/asio.hpp> 
#define BOOST_PROCESS_WINDOWS_USE_NAMED_PIPE 
#include <boost/process.hpp> 
#include <boost/array.hpp> 
#include <boost/bind.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 

using namespace boost::process; 
using namespace boost::asio; 

io_service ioservice; 
boost::array<char, 4096> buf; 

#if defined(BOOST_POSIX_API) 
posix::stream_descriptor in(ioservice); 
#elif defined(BOOST_WINDOWS_API) 
windows::stream_handle in(ioservice); 
#endif 

void begin_read(); 
void end_read(const boost::system::error_code &ec, std::size_t bytes_transferred); 

int main() 
{ 
  std::string exec = find_executable_in_path("hostname"); 
  std::vector<std::string> args = boost::assign::list_of("hostname"); 
  context ctx; 
  ctx.environment = self::get_environment(); 
  ctx.stdout_behavior = capture_stream(); 
  child c = launch(exec, args, ctx); 
  pistream &is = c.get_stdout(); 
  in.assign(is.handle().release()); 
  begin_read(); 
  ioservice.run(); 
} 

void begin_read() 
{ 
  in.async_read_some(boost::asio::buffer(buf), 
    boost::bind(&end_read, placeholders::error, placeholders::bytes_transferred)); 
} 

void end_read(const boost::system::error_code &ec, std::size_t bytes_transferred) 
{ 
  if (!ec) 
  { 
    std::cout << std::string(buf.data(), bytes_transferred) << std::flush; 
    begin_read(); 
  } 
}

Download source code

If you look at the sample code above you'll probably agree that it should be easier to use asynchronous operations without passing native handles around. Getting the native file handle and initializing the generic I/O object of Boost.Asio doesn't seem to be a good solution. As there are in fact two different implementations of the generic I/O object - one for POSIX and one for Windows systems - it gets even worse due to preprocessor directives which have to be used.

Furthermore asynchronous operations only work on Windows if the macro BOOST_PROCESS_WINDOWS_USE_NAMED_PIPE is defined. By default, Boost.Process creates an anonymous pipe if a stream is captured. However anonymous pipes don't support asynchronous operations on Windows. On Windows a named pipe must be created for asynchronous operations to work.

Defining the macro BOOST_PROCESS_WINDOWS_USE_NAMED_PIPE makes Boost.Process create named pipes for all streams on Windows even if a developer wants to use asynchronous operations only for some streams. As named pipes can be useful in other scenarios it would be better if Boost.Process supported named pipes explicitly. The library should offer developers a choice between anonymous and named pipes. Of course named pipes wouldn't support asynchronous operations on Windows. But developers could use named pipes in other use cases if they were only supported explicitly. As named pipes are called FIFO on POSIX systems there could be two classes boost::process::pipe and boost::process::fifo defined. The Windows documentation doesn't refer to named pipes as FIFO. But then again on POSIX systems a pipe is always anonymous - FIFO is used to name the other interprocess mechanism.

6. Terminating processes

Killing a process

It is possible to terminate another process by invoking terminate() which is defined by boost::process::process. Of course it's not possible to terminate an arbitrary process. Operating systems check access rights before they allow a process to terminate another one. As Boost.Process currently doesn't provide any means to access any processes but only child processes terminate() will always work for now.

On POSIX systems terminate() is implemented by sending the signal SIGTERM. As processes are allowed to ignore SIGTERM it is possible to make terminate() send SIGKILL which can't be ignored. If the sole argument of terminate() is set to true - the default value is false - SIGKILL is sent.

#include <boost/process.hpp> 
#include <boost/thread.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("notepad.exe"); 
  std::vector<std::string> args = boost::assign::list_of("notepad.exe"); 
  context ctx; 
  ctx.environment = self::get_environment(); 
  child c = launch(exec, args, ctx); 
  boost::this_thread::sleep(boost::posix_time::seconds(3)); 
  c.terminate(); 
}

Download source code

On Windows terminate() calls TerminateProcess(). The parameter is ignored as TerminateProcess() always causes another process to exit.

As a side note, it is not really recommended to use terminate(). It forces another process to exit no matter in what state it is. Calling terminate() should be really a last resort if nothing else works and the other process must exit as soon as possible.

7. Waiting for termination

Spare a minute

When terminate() is called the respective process is terminated immediately. However if a process wants to wait for another process to exit wait() can be called. While terminate() is defined by boost::process::process wait() belongs to boost::process::child. That means a process can only wait for child processes to exit. This is not really reasonable and should be changed.

Another problem with wait() is that it is a blocking function: wait() only returns if the process has exited. There is currently no support to wait asynchronously. Again this is something which should be changed.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("hostname"); 
  std::vector<std::string> args = boost::assign::list_of("hostname"); 
  context ctx; 
  ctx.environment = self::get_environment(); 
  child c = launch(exec, args, ctx); 
  status s = c.wait(); 
  if (s.exited()) 
    std::cout << s.exit_status() << std::endl; 
}

Download source code

When a process exits wait() returns an object of type boost::process::status. This class provides methods to find out how a process exited. The method exited() can be called to detect if the process exited gracefully. Only if it returns true exit_status() can be called which returns the exit code.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("hostname"); 
  std::vector<std::string> args = boost::assign::list_of("hostname"); 
  context ctx; 
  ctx.environment = self::get_environment(); 
  child c = launch(exec, args, ctx); 
  posix_status s = c.wait(); 
  if (s.exited()) 
    std::cout << s.exit_status() << std::endl; 
  if (s.signaled()) 
    std::cout << s.term_signal() << std::endl; 
}

Download source code

POSIX systems provide more data when a process exited. That's why Boost.Process defines another class boost::process::posix_status which can be instantiated with an object of type boost::process::status. For example it defines a method signaled() which returns true if the process exited because it received an external signal. With term_signal() this signal can be fetched.

If a process spawns a child process and doesn't care about it it doesn't need to wait for the child process to exit. However on POSIX systems if the child process exits and the parent process is still running the child process becomes a zombie process. The operating system still saves data about the child process like its return code. This data is saved until the parent process collects it with wait() or exits itself. That's why it is recommended to always reap child processes by calling wait(). That's not easy if several child processes have been created as wait() blocks. Again this is another area where Boost.Process deserves to be improved. As SIGCHLD is sent to a parent process whenver a child process exits and becomes a zombie process Boost.Process could setup an I/O object for handling SIGCHLD signals asynchronously.

There is a similar problem on Windows. Windows cleans up resources only when all handles are closed. As objects of type boost::process::child store a handle to another process Windows can't clean up resources even if the other process has exited - and even if wait() has been called and returned! The object must be destroyed as only the destructor of boost::process::child closes the handle.

8. Managing pipelines

Processes in a row

A pipeline is a special case in which two or more processes are created and their standard input and output streams are connected. On POSIX and Windows systems the pipe operator | is used to create a pipeline. Data written to the standard output stream of one program is received through the standard input stream of another program.

With Boost.Process it's possible to create a pipeline by calling boost::process::launch_pipeline(). This function expects a container with entries. An entry describes a process in the pipeline. The description contains the executable name, arguments and a context.

The class which is used to define an entry is boost::process::pipeline_entry. A pipeline is created by instantiating boost::process::pipeline_entry for every process, storing the entries in a container and passing it to boost::process::launch_pipeline().

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  context ctx; 
  ctx.environment = self::get_environment(); 
  std::vector<pipeline_entry> entries; 
  std::vector<std::string> args = boost::assign::list_of("ls")("-l")("/"); 
  entries.push_back(pipeline_entry(find_executable_in_path("ls"), args, ctx)); 
  ctx.stdout_behavior = inherit_stream(); 
  args = boost::assign::list_of("grep")("bin"); 
  entries.push_back(pipeline_entry(find_executable_in_path("grep"), args, ctx)); 
  launch_pipeline(entries); 
}

Download source code

The sample code above does the same as if a user entered ls -l / | grep bin on a POSIX system.

The return type of boost::process::launch_pipeline() is boost::process::children which is a typedef for std::vector<boost::process::child>. That means it's possible to access every child process individually and for example wait until it exits. Instead of waiting for every child process individually Boost.Process provides the helper function boost::process::wait_children(). Compared to wait() which is a method of boost::process::child boost::process::wait_children() is a free-standing function.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  context ctx; 
  ctx.environment = self::get_environment(); 
  std::vector<pipeline_entry> entries; 
  std::vector<std::string> args = boost::assign::list_of("ls")("-l")("/"); 
  entries.push_back(pipeline_entry(find_executable_in_path("ls"), args, ctx)); 
  ctx.stdout_behavior = inherit_stream(); 
  args = boost::assign::list_of("grep")("bin"); 
  entries.push_back(pipeline_entry(find_executable_in_path("grep"), args, ctx)); 
  children cs = launch_pipeline(entries); 
  status s = wait_children(cs); 
  if (s.exited()) 
    std::cout << s.exit_status() << std::endl; 
}

Download source code

When an object of type boost::process::children is returned by boost::process::launch_pipeline() and forwarded to boost::process::wait_children() the function waits for every child process to exit. The return code of every child process is evaluated. If a return code is found which is different from EXIT_SUCCESS it is returned by boost::process::wait_children(). If there are more child processes to wait for boost::process::wait_children() will reap all of them. However the return values of those child processes are ignored. boost::process::wait_children() returns the first exit code which is not equal to EXIT_SUCCESS. The exit code is not returned directly as an int but within an object of type boost::process::status.

9. POSIX-only features

Better support of Unix platforms

You have already been introduced to the class boost::process::posix_status. In the beginning of this article another POSIX class has been mentioned though: boost::process::posix_child.

The class boost::process::posix_child supports features which are only available on POSIX systems. To create a POSIX child process you don't use boost::process::launch() but boost::process::posix_launch(). This function has the same signature as boost::process::launch() but returns an object of type boost::process::posix_child. The context which is passed to boost::process::posix_launch() must be based on boost::process::posix_context though.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("hostname"); 
  std::vector<std::string> args = boost::assign::list_of("hostname"); 
  posix_context ctx; 
  ctx.environment = self::get_environment(); 
  posix_child c = posix_launch(exec, args, ctx); 
}

Download source code

If you lookup the reference of boost::process::posix_child you'll see that there are only two additional methods get_input() and get_output() defined. Both expect a file descriptor as an argument and return an input or output stream corresponding to the file descriptor. The reason is that on POSIX systems a child process can have more input and output streams than the three standard ones. Of course they need to be setup first. Thus in the sample code above it doesn't make any sense to use boost::process::posix_launch() because no additional streams have been setup.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 
#include <unistd.h> 
#include <sys/types.h> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("dbus-daemon"); 
  std::vector<std::string> args = boost::assign::list_of("dbus-daemon") 
    ("--session")("--fork")("--print-address=3")("--print-pid=4"); 
  posix_context ctx; 
  ctx.output_behavior.insert(behavior_map::value_type(STDOUT_FILENO, inherit_stream())); 
  ctx.output_behavior.insert(behavior_map::value_type(STDERR_FILENO, inherit_stream())); 
  ctx.output_behavior.insert(behavior_map::value_type(3, capture_stream())); 
  ctx.output_behavior.insert(behavior_map::value_type(4, capture_stream())); 
  ctx.environment = self::get_environment(); 
  posix_child c = posix_launch(exec, args, ctx); 
  std::string address; 
  pid_t pid; 
  c.get_output(3) >> address; 
  c.get_output(4) >> pid; 
  std::cout << "Address: " << address << "\nPID: " << pid << std::endl; 
}

Download source code

In the sample above a programm called dbus-daemon is used. What D-Bus is and what this program does it not important. The program is used because it supports command line arguments which make the program write data to certain streams. As the file descriptors of those streams can be specified on the command line, too, dbus-daemon makes it easy to play with the boost::process::posix_child class.

First the context has to be configured: The streams based on the file descriptors 3 and 4 are captured as these are the streams dbus-daemon will write data to. The class boost::process::posix_context provides two public properties input_behavior and output_behavior. Both properties are based on a type behavior_map which is a typedef for std::map<int, stream_behavior>. As POSIX systems support more streams than the three standard ones every stream represented by a file descriptor can be a behavior setting assigned. In the code sample above the streams based on the file descriptors 3 and 4 are captured. The streams 1 and 2 - the sample uses STDOUT_FILENO and STDERR_FILENO from unistd.h - are inherited.

When the child process is created the respective streams are accessed by passing the file descriptors as sole arguments to get_output(). This method returns an object of type boost::process::postream which works as usual.

Apart from the properties input_behavior and output_behavior the class boost::process::posix_context also provides uid, euid, gid, egid and chroot. Developers familiar with POSIX operating systems should understand the purpose of those properties immediately.

10. Windows-only features

Better support of Windows platforms

It has been already mentioned in this article that on Windows systems the class boost::process::win32_child can be used. To retrieve an object of boost::process::win32_child a process must be created with boost::process::win32_launch() though. In order to call boost::process::win32_launch() the context argument must be based on boost::process::win32_context.

#include <boost/process.hpp> 
#include <boost/assign/list_of.hpp> 
#include <string> 
#include <vector> 
#include <iostream> 

using namespace boost::process; 

int main() 
{ 
  std::string exec = find_executable_in_path("notepad.exe"); 
  std::vector<std::string> args = boost::assign::list_of("notepad.exe"); 
  win32_context ctx; 
  ctx.environment = self::get_environment(); 
  win32_child c = win32_launch(exec, args, ctx); 
  std::cout << c.get_handle() << std::endl; 
}

Download source code

boost::process::win32_context is derived from boost::process::context. The only difference is that boost::process::win32_context provides another public property called startupinfo. It can be used to specify so-called startup information. Windows defines the structure STARTUPINFOA which this property is based on. If no startup information is given this property can be ignored and boost::process::win32_context can be treated just like boost::process::context.

Once a Windows child process has been created a few more methods can be invoked. While boost::process::child defines only get_id() the class boost::process::win32_child defines get_handle(), get_primary_thread_handle() and get_primary_thread_id(). And that's all - there are no really spectacular extensions for Windows.

11. Running test cases

Making sure Boost.Process works on your platform

Boost.Process ships numerous test cases which can be run to verify that the library works correctly on a platform. The test cases can be found in the directory libs/process/test.

As Boost.Process is a library to manage processes nearly all test cases create at least one child process. All test cases access a helper program to do so. For this to work test cases must know where to find this helper program. The problem is though that when test cases are run compilers create different directories to build the test cases. As directory names are different depending on the compiler used it's difficult for test cases to find the helper program.

Currenly the path of the helper program is hardcoded in libs/process/test/misc.hpp. Before you run the test cases you must check the hardcoded path in libs/process/test/misc.hpp and eventually update it. If you don't do you might see lots of failures when you run the test cases as they simply might not be able to find the helper program.

When you have verified that the path to the helper program is set correctly you can run the test cases. Change to the directory libs/process/test and run bjam. You don't need to pass any command line arguments. The helper program and all test cases will be automatically built and run.

Boost.Process 0.3x has been tested successfully with the following compilers and platforms so far:

Visual C++ 2008, Windows XP SP 3
Visual C++ 2008, Windows Vista SP 1
GCC 3.4.4, Cygwin
GCC 3.4.6, Solaris 10 (SPARC)
Sun C++ 5.9, Solaris 10 (SPARC)
GCC 4.1.2, Linux 32-bit
GCC 4.2.1, Linux 32-bit

On POSIX systems (except Cygwin) Boost.Test must be patched though. There is a problem with the way Boost.Test treats the SIGCHLD signal. A macro has been proposed but hasn't been added yet to Boost.Test. For the test cases of Boost.Process to work one line in a Boost.Test file has to be changed manually. What has to be done exactly has been explained in a message to the Boost mailing list. There are also a few more explanations about how to run the test cases on the various platforms. Although the message is from September 2008 the explanations are still up-to-date.