plushy

Developer Documentation

plushy

Introduction

Extensions and incremental improvements are vital to Plush's future success. Towards that end, this document describes the structure and implementation details of Plush. We will start with an overview of Plush's objects. Then, we'll dive into the modules you will be most likely to see while working on Plush extensions. After discussing the utilities that are available to you in the Plush environment, we will discuss extending Plush through analysis of a few case studies.

Plush Objects

The Plush Class (plush.H)

The Plush class is the superclass of both the Main and the Client classes. It contains all the global data structures and the functions to start most of Plush's internal tasks. There is exactly one Plush object per Plush instance, and you can access it via the global pointer g_plush. Let's say you wanted to have Plush connect to a host for you. You would call g_plush->connectTo(hostname). Like everything else in Plush, this function returns immediately, but an event of type (Event::MAJOR_NODE, Node::MINOR_JOINED) will fire when the connection is established.

Data Structures

There are a couple of tables maintained by Plush that can be useful.

  1. The node table maps hostnames to Node objects.
  2. The file transfer and installation tables track outstanding file transfers and software installations; these are keyed by the file transfer or installation ID, which is an integer. After a lookup, these return pointers to FileTransfer or Installation objects (see file_transfer.H and software_installer.H).
  3. The project map holds Project objects, keyed by name.
  4. There are two process tables, one each for local and remote processes. The local process table tracks processes running on the local machine, keyed by name or process id. The remote process table tracks processes running on remote machines, and it is keyed by remote host and process id.
  5. The matching table keeps a list of all the known Matching objects. You can iterate through them, look them up by project and component name, or compute the reverse map, grabbing the appropriate Project, Experiment, Component, and Execution objects for a Matching.

Each of these objects is locked independently. If you use the normal functions to access the tables, then the locks are handled for you. If you decide to use the iterators, though, you must lock and unlock the structures yourself. (When in doubt, see plush.H .) If you wanted to iterate through all the nodes, printing their names, you would call nodesLock(), then iterate from nodesBegin() through nodesEnd(), and then call nodesUnlock(). Try to use the non-iterator functions, though, because the exposed iterators are ugly.

Object Factories

To support extensibility, there is a "type" attribute in most Plush XML and software interfaces that maps data to code. For example, the software installer type "rpm" has to map to a piece of code that can deal with Red Hat packages, whereas an installer type "tar" has to map to a different chunk of code. Plush provides functions that associate strings to objects. They are listed below.

  1. SoftwareInstaller *getSoftwareInstaller(const string &method)
  2. FileTransferMethod *getFileTransferMethod(const string &method)
  3. ConfigurationMatcher *getConfigurationMatcher(const string &method)
  4. ResourceAllocator *getResourceAllocator(const string &type)
  5. LogManager *getLogManager(const string &type)
  6. ProcessMonitor *getProcessMonitor(const string &type)
Actions

The Plush class also provides methods for starting actions, such as connecting or disconnecting from hosts. You can control connections with the connectTo(hostname), connectTo(Slice *), and disconnectFrom(hostname) functions. You can send a project to another Plush process by calling one of the sendExperimentControllerProject functions. You can force Plush clients to send updated data by calling getClientUpdates, and you can force client software upgrades by calling upgradeClient or upgradeClients. You can force Host Monitor updates by calling getHostMonitorUpdates.

Experiment Controllers

Experiment controllers implement the "state transitions" within Plush. They are ultimately responsible for deciding and controlling what happens on every host, how failures are detected and resolved, and all other matters regarding an experiment run.

Projects, etc. w/ major operations
Project

The Project class has no special methods, and it only stores data. You can add, remove, and find Components, Experiments, and Software. The Process and Slice functions are deprecated and should not be used in future code. The Component objects that are referenced by a Project are templates -- they are not executed. Components that have been executed are ancestors of the appropriate Experiment.

Component

The Component objects have pointers to the Resources, ResourceSpecification, and ResourceAllocation objects for this object. It also includes a vector of Process objects to run on each node and a list of Software objects to install on each node.

Experiment

The Experiment objects contain a mix of static and dynamic data. The static data is the Execution template. When you run an Experiment, it duplicates the template and fills it in with real data, such as the process ids which were used, where to find the log files, and so forth. The Experiment also contains pointers to all the Executions it knows about, and it can look them up by originating host and time.

Execution

An execution contains a sequence of Components, Processes, Barriers, and other executable elements to run. Whenever Plush is going to execute a block of code according to an experiment description, it consults the appropriate Execution block. Any run-time state, such as process IDs, paths, errors, and so on, are stored in the Execution blocks.

Matching

A Matching is a set of hosts which satisfied a request to a Configuration Matcher. As part of an experiment description, a user specifies an abstract resource specification, such as "give me 100 machines with a load average less than 10." This abstract description is passed to a Configuration Matcher, such as SWORD, which kicks out a list of hosts that satisfy the request. These host sets are stored in Matching objects.

Plush Utilities

Debugging (debug.H)

Plush creates a file called logfile.txt that contains the output of debug() statements encountered in the code. The debug() macro outputs the given string or ostream& rvalue (expression) to the logfile. For example, debug("The sky is falling with error code " << x); is a valid statement. When encountered, that line would produce a string like "file.cc:128:The sky is falling with error code -1\n" inside logfile.txt . Note that the file, line, and trailing newline are automatically added.

More frequently in the code you will encounter the debugn() macro, which takes an integer as its first argument. Plush maintains a debugging level, and it only prints statements where the given number to debugn() is less than the level. If the level were 3, then debugn() calls with 1, 2, or 3 would be printed. By convention, only the natural numbers are used for debug levels -- a debug level of 0 implies that no output is printed. By definition, debug(foo) is the same as debugn(1, foo).

You can set the debugging level on the command line via the -d switch (e.g. "./plush -d 5"). Within the code, you can call Debug::setLevel() or Debug::getLevel(). If you're having trouble with threads, you can also call Debug::setShowThreadIDs(true) to have pthread_self() put at the beginning of all the debugging lines, also. You can also set the "DebugLevel" preference, and that level will be honored by any clients that you connect to onwards. Thus far, most important internal events or errors have levels of 1 or 2, user errors and informational messages have been at levels 3 and 4, and gratuitous output is at level 5.

Also, the debugging macros are thread-safe.

Logging (logger.H)

Unlike the Debug functions, which show data useful for debugging Plush, the Logger functions provide data useful for debugging the deployment and/or profiling Plush. The Logger emits entries for major events (connecting to/from nodes, getting updates, and so on) and for important errors, like those encountered with malformed XML input. This log is available to the user for viewing online, and it is stored to the plush-logfile.txt file.

Unlike the debug() macros, which use C++ streams, the Logger functions use printf()-style output. To facilitate online viewing, the Logger functions keep a buffer of the most recent messages in memory; this was much easier to write with C-style I/O than with C++ streams. Also, much of the benefit of C++ streams, particularly the ability to write out class objects by overloading operator<<, was unnecessary here because these are simple progress and error messages, not internal debugging statements.

The two most frequent functions you will use are Logger::logError and Logger::logEvent. Both take the same arguments as printf(). For example, Logger::logError("Whoops, you wrote the wrong thing on line %d.\n", 10); writes "Tue Dec 28 15:34:53 PST 2004 E Whoops, you wrote the wrong thing on line 10.\n" to the logfile. The "E" indicates that the line represents an error, and is there for easy grepping. Logger::logEvent works similarly, but outputs an "M" instead.

Logger::logError and Logger::logEvent are specific cases of Logger::log(), which takes a message priority as the first argument. See the header file for more information.

If you want to access the recent log messages which are stored in memory, call Logger::getBuffer(). It will return a string object.

Working with XML Documents (xml_util.H)

Plush uses libxml2 for XML manipulation. Almost all objects have readXMLDocument() and writeXMLDocument() functions for serialization. To help with reading and writing XML documents with C++ data types, the XMLUtil class provides static methods for reading and writing XML attributes and content.

Consider the function bool XMLUtil::readXMLAttr(xmlNode *node, const char *name, int &value); . It looks for the attribute name under the given node, parses it as an integer, and puts the result into value. If the attribute could not be found, or if it could not be parsed as an integer, than false is returned. Otherwise the function returns true. All of the readXMLAttr and readXMLContent functions work like this, with different data types.

The functions to write attributes and content are quite similar, though their third argument (value) is not modified, and they return void (instead of bool).

The xml_util.H header also includes utilities to write XML nodes into files or memory buffers. It also overloads operator<< for the type xmlNode*, so statements like cout << (xmlNode *)node << endl; will print the contents of node as a string.

Last, XMLUtil contains a few out-of-place functions of general use. It has itoa() and ftoa() functions that convert integers and doubles to strings, and it has a simple string hash function, unsigned int hash(const string &).

Events (event.H)

Control flow in Plush is usually asynchronous. Often one will do a little bit of work, start a task, and then wait until the task completes. The output of the task is then read and used. For example, to update the Host Monitoring data from Ganglia, Plush starts a perl script which reads the current data and converts it to Plush XML format. Once the perl script exits, a notification fires within Plush, and the output of that script is read. This notification is called an Event in Plush.

Events are the general notification tool for Plush. Every event has a major number (int), a minor number (int), and an opaque data field (void *). The major number indicates the class of event; its values are defined in an enumeration in the file event.H, and include things like Event::MAJOR_PROCESS and Event::MAJOR_CONFIGURATION_MATCHER. The minor numbers indicate what happened, and they are defined at the class level. So for Event::MAJOR_PROCESS, we have minor numbers in process.H like Process:MINOR_STARTED, Process::MINOR_EXITED, and Process::MINOR_KILLED. In these cases, the opaque data field would be filled with a pointer to a Process class object.

Fire an event by calling Event::fireEvent(int major, int minor, void *data) . The event will be queued and handled in FIFO order. A thread handling the events will call the function named handleEvent(int major, int minor, void *data) for everyone who has registered interest in the event.

If you are interested in an event, extend from the class EventHandler, implementing a handleEvent() function, and then call Event::addHandler(EventHandler *, int major, int minor) to register. You will receive event notifications for the given major and minor numbers until the system exits or you call removeHandler.

Timers (timer.H)

The other source of asynchrony in Plush is Timers. To run a piece of code in the future, extend the TimerHandler class from timer.H, implementing the expiredHandler function. Then call Timer::setTimer(handler, delay), where delay is in seconds. Your function will be called after that time has expired. Beware, though, that the timer code is optimized for longer-running timers (order minutes to days), and so the first firing of a timer with a very short delay may be late.

Timer handlers are unregistered after they fire, so if you want your code to run periodically, call Timer::setTimer(this, delay) from within the expiredHandler.

Processes

Plush provides a C++ wrapper to many of the standard UNIX process system calls, such as fork,dup2, and exec. Rather than calling those directly, use the Process class instead. Generally, you create a Process object with the path and command-line arguments to a process you want to run. You then delegate the monitoring of that process to someone, usually Plush's default process monitor (g_plush->getProcessMonitor()->attach(process);). You register a handler for the Event::MAJOR_PROCESS, Process::MINOR_EXITED event, and then you call process->start(). Once the process exits, the process monitor will fire an event, and your handler will be called. It can then do things like look at the output status (process->getStatus()), read the process' output (g_plush->getProcessMonitor()->getData(process, "")), and so on.

The Process/ProcessMonitor abstractions also work-around bugs in Linux's pthreads implementation, so it is prudent to use them.

Preferences (preferences.H)

Plush provides a simple key/value pair system for storing preferences. The file plush.prefs is read when a Plush process starts. To set a preference, call g_plush->getPrefs()->setPref(key, value). To read a preference, call g_plush->getPrefs()->getPref(key). If key exists, the value assigned to it will be returned. If it does not, an exception is thrown. Unfortunately, this means you have to wrap all your preference code in a try{}catch(){} block.

You can set a default value for a preference key by adding a line to the Preferences::initialize(void) function in preferences.cc. Though I don't know why these would be useful, erasePref(key) and erase() (all) functions are also provided to clean your preferences.

Message Field Summary

Plush's application messages contain many fields:

Often the payload contains Plush XML data. As parsing XML is expensive, the extra integer and string arguments (3 each per message) are filled in with important and small data relating to the response. When possible, the extra fields are used to quickly determine which component should receive the message, and then the XML is passed to that component. For example, every file transfer and software installation is marked with a unique ID. This ID is always placed in the second integer header field, so Plush can look up the proper object based on the ID, and pass the message to that object.

The table found here documents the allocation of message fields. The first two columns list the message type, and whether or not it is a request (coming from the controller) or a response (from the clients). The remaining seven columns document how the header fields are interpreted.

Event Summary

Event handlers have a data field of type void*. Depending on the type of message, the expected type of the value placed in the field differs. For node events, the data field contains a pointer to the appropriate node. For configuration matcher events, the field points to a component that has been matched. The table found here documents the types and interpretations of the data field under various events.

Proxy Self-Updates

Plush proxies (or "clients") are able to replace themselves with updated versions. Included with every client is a script, called "bootstrap.pl", that is used for installing the clients the first time, and for updating them. In simple terms, this script compares the remote client version to the client version posted on http://plush.cs.williams.edu/dist/, and it installs if necessary.

Clients automatically start "./boostrap.pl" when the clients receive an INIT_CLIENT message. They run "bootstrap.pl" with the "-c" argument to check for updates. If there are updates, or if the client has been told to update even if the version seems correct, then the client runs the installer script a second time. The bootstrap script fetches the latest binary from the web site, replacing the running one. Once the script exits, the client leaves the mesh and calls exec() to start the new version of the client.

Usually, this process works like a charm, and it allows us to do automated client updates without manually re-bootstrapping our slices. Unfortunately, there are a few potential problems:

  1. If the string in http://plush.cs.williams.edu/dist/main/VERSION doesn't match the clients in the same directory, an infinite loop of client self-updating could result.
  2. If I want to try out some new snippet of code on a few clients, and I send them out there, I don't want them to automatically update with the "stable" code versions.
  3. You can't use the auto-update feature for development code; the only solution is to manually copy the test binary to the remote hosts.

Some of these problems are solved through command-line flags that tell the client to (1) always, or (2) never, update automatically. Others, such as #3, require extra support.