Appendix A. Implementation of a Scalable and Dynamic Interface for Open Sound World using Open Sound Control

Andrew W. Schmeder (andy at a2hd dot com)

Abstract:

This document describes the nature and use of Open Sound Control within a Open Sound World, a complex and dynamic musical application written by Amar Chaudhary and others. We describe in detail the process of constructing Open Sound Control namespaces at runtime and performing manipulations and queries in that space via network messages.

Introduction to OSC and OSW

Open Sound World (OSW) is a visual/dataflow programming environment for real-time stream-based multimedia processing, especially audio. It is similar to PD, jMax, Max/MSP, and numerous programs. OSW is implemented in C++ using powerful generic templates and OO-oriented methods. Open Sound Control (OSC) is a general protocol for encapsulating and organizing control among multi-layered systems for musical applications. OSC is developed at The Center for New Music and Audio Technology (CNMAT) at UC Berkeley by Matt Wright and others.

Use of Hierarchy

A "patch" or "program" in OSW consists of a collection of configured objects. The basic primitive object is a Transform. A Transform may have a number binding sites for data I/O which are called Inlets, Outlets and States. Some Transforms may also be Containers, in that they may contain other Transforms. The Patch is a Container and a Transform that the user interacts with (in a graphical representation, the Patch is referred to as the "canvas"). Via the GUI, a user places arbitrary Transforms into the Patch, and within that Patch may create Connections among the Inlets and Outlets. Connections are constrained and directional such that that an Outlet may only connect to an Inlet provided that the types are compatible (a discussion of the OSW type system would also merit a significant document). A Patch may contain another Patch (i.e. a "subpatch"), which leads us to a program in OSW represented as a hierarchical structure of Transforms.

In OSC, messages are constructed using a simple syntax denoting an address and a set of typed arguments. The address is formulated in a hierarchical fashion similar to a URL or a Unix-style path. There is a natural coupling between the hierarchy of objects which the user sets up in OSW to an OSC address space. This map is automatically constructed at runtime as the user "builds a patch", and makes possible for OSC messages to interact with this space via UDP networking.

Control System Requirements and Goals

We desire to make the OSC interface to OSW powerful enough so that it can be used as a substitute for the GUI, i.e. that every possible action ("patching operation") within the GUI is possible using only OSC messaging. This enables us to turn OSW into a networked application server. The fact that this is at all possible is predicated upon the fact that the design of OSW adheres to the model-view-controller pattern (The OSW server is the model). The benefits of this design are increased stability, scalability via clustering, multi-user interaction, ease of integration with external software, and better debugging and regression testing capability due to complete separation of the UI. At CNMAT, we are especially interested in researching clustering and automation which is possible using special networked clients. Finally, it is worth noting that the jMax project uses a similar design, in that the synthesis server, Fts, is separated from the GUI via a network messaging system. However, we prefer strongly to use OSC for a number of reasons; it is well documented, and it is an accepted standard with support in many applications and programming languages.

Our requirements are such that we will, by necessity, test and implement nearly the full range of functionality described in the OSC specification. In addition to the basics of constructing and parsing, sending and receiving OSC messages, we require namespace searches, documentation, and type signature queries. In some cases we have chosen to go beyond what the OSC specification describes, particularly in the area of transaction semantics and type extensions. The full details of what has been done, where OSC conformance is present and where we extend it, is described the the following sections. We hope that this model may serve as an inspirational source, as proof that use of OSC to control complex musical systems is a feasible and worthwhile goal. Furthermore we hope that this document may help clarify effective use at a high level of OSC for programmers and users of OSW.

Conventions in this Document

In this document, we give numerous examples of hypothetical message exchanges between a client and an OSW server. We use some conventions in formatting to help make these examples clear. OSC messages are enclosed with brackets [, ], and begin with an OSC address, followed by zero or more arguments, each separated by a space. The types of the arguments are implied by the use of quotes for strings, or numeric characters for various numeric types. In some cases, details will be omitted, signified by using an ellipsis (...). Some special types will be enclosed in curly braces {, }, with a prefix naming the type, e.g. blob:{...}. A message may be prefixed by > or <, which is an indicator of the transmission direction; > implies the message is transmitted from client to server, and < implies the opposite. Finally, time always progresses downwards, i.e. when multiple messages are presented, the messages further down the page are transmitted at a later time. The following shows an example:

> [ /foo/bar "hello" 1 2 3 blob:{...} ]
< [ /bar/baz "Got hello and some numbers and a blob" ]

Testing the Implementation with OSW-CLI

A simple test program is provided, which makes use of the features described in this document. This program, called osw-cli, is available as open source from http://www.a2hd.com/research . This program presents a command-line interface, vaguely similar to a Unix-style shell, which is capable of controlling one or more OSW servers using OSC messages (Note: "OSW server" refers to OSW running in "bin" mode, i.e. launched using the oswbin command.). Refer to the above website for details.

We also note that in several cases, examples of interaction with the OSW server will not be entirely accurate, or will omit important details, when compared with what the message would look like in an actual interaction. The interested reader is referred to examine the above program.

The Implementation

Server Discovery

The first step of establishing a connection between client and server is the discovery of the server's presence on the local area network. To do this, the OSW server periodically sends a UDP packet to the broadcast address on port 7706, containing some basic statistical information. The client listens for these packets and uses the stream to determine several things about the server; its network address (by examining the return address), the amount of time that the server has been running (uptime), and the activity level (expressed in Activations per second). The client can then communicate with the server using OSC messages over UDP. The server listens for these messages on port 7707 and routes its reply messages to the client on port 7708.

Transaction Methodology

The control session between client and server consists of an exchange of OSC messages. The OSC specification does not define much about how this exchange should take place, however we felt that a greater formalization would be useful. In the following discussion we refer to the "input message" as a command or query originating from the client, and the "reply message" as the server response which is routed back to the client.

Every Input Has a Reply

Because we use UDP, the physical delivery of both input and reply messages is not assured. We require that every input message generates a reply message which can be used to confirm the receipt of the message. This presents a minor problem to the design of OSC, in that OSC is intended to be a nearly-stateless protocol ("nearly" because there are some implicit assumptions that the OSC namespace does not change too quickly (however even this is subject to debate)). In our experience, UDP packets are in fact never lost when sent over a relatively quite local network, however we note that our transaction Al semantics do not require the server to be stateful, and require minimal state retention on the client side.

Glob Expansion

When an input address uses a pattern-matching glob, that pattern is compared against the internal map. For each match, the server generates a new simulated message who's address is the explicit match. In this sense, the use of a globbing target is equivalent to sending a set of explicit messages. The system also will return the glob-message along with the number of matches found;

> [ */documentation ]
< [ */documentation 24 ]
< [ /main_clock/documentation 0 "The Main Clock" ]
< [ /osw-quit/documentation 0 "Force this server to exit" ]
< ... (22 additional reply messages) ...

Reply Address matches Input Address

We use the convention that the address of reply messages matches the address of the input.

> [ /hello/osw "How are you?" ]
< [ /hello/osw "Okay." ]

Most significantly, consistent use of the full input address means that the client does not need to engage in a stateful wait cycle after each transmit in order to correlate the reply message.

We note that the OSC specification describes a different reply format for builtin query types, such as:

> [ /hello/osw/documentation ]
< [ /documentation "/hello/osw" "Returns a greeting" ]

However, in our implementation, this would be:

> [ /hello/osw/documentation ]
< [ /hello/osw/documenation 0 "Returns a greeting" ]

The difference is something of triviality, but we feel that our system is more elegant. The initial zero is explained next.

Return Codes

We introduce the convention that each reply message contains, as its first argument, a return code. The return code is an integer, and similar to the use of return codes in the C language, we define that if the integer is less than zero, it signifies an error response. Therefore, a return code of zero or greater indicates a successful processing of the message. The numeric semantics enable the client to quickly sort errors from non-errors. The following shows the results of a query against an invalid address.

> [ /hello/foo/documentation ]
< [ /hello/foo/documentation -1 "Invalid address" ]

It may be worth considering the use of predefined error codes to indicate common errors, for example, -1 for a general error, -2 for invalid address, -4 for incompatible type, -8 for insufficient arguments, etc. For now, we always use -1 as an error code and include a short message to describe the problem.

The OSC specification does not define how to handle error conditions.

Documentation

Documentation is important for making the interface clear to the human user, even if that user is only the occasional programmer. We are careful to provide some documentation for every part of the system. We also include a documentation string for the root address (/), which gives the program name and version number:

> [ /documentation ]
< [ /documentation 0 "Open Sound World 1.1beta2" ]

The present OSC specification does not describe the role of root documentation.

Namespace Exploration

Because we expect OSW to have a namespace which changes dynamically, we must implement namespace queries.

> [ / ]
< [ / 25 "advance-clock" "alsa" "casts" ... "thread" "transform-types" ... ]

In this case, we use the return code to indicate the number of sub-addresses found immediately below the query address. The client may use this information to recursively map out the OSW namespace. In the following example, the client can find out how many threads are running on the OSW server in question

> [ /thread/ ]
< [ /thread/ 4 "set-name" "task0" "task1" "watchdog" ]

This shows that there are three threads running, task0, task1 and a watchdog (the set-name entry is explained later). The client might guess that this means OSW is running on a dual-CPU machine (which is correct for this example).

Type Signatures

Type signature queries enable the client to determine what argument types the server expects as inputs to particular target addresses. We implement this in a fairly obvious way:

> [ /osc-address/type-signature ]
< [ /osc-address/type-signature 0 string:{...an OSC type tag here...} ]

Container / Non-readable Types

In several cases, there are addresses which do not have a type which can be set or queried (see discussion of set-value and current-value queries elsewhere). Essentially, these are non-readable addresses. These cases are also analogous to directories in a file system. For these cases, a type-signature query will return the Nil type (N).

The OSC specification does not describe how to handle non-readable or container addresses with respect to type signatures. Use of Nil is one option which we currently use, another reasonable option is to return an error message.

Extended Type Information

In many cases, an addresses also corresponds to an object at the C++ level (e.g. Transform, Patch, Container, State, etc). An object may have more than one of these properties due to multiple inheritance or compositing. Here we examine the main clock, which is a State<Time> object.

> [ /main_clock/type-signature ]
< [ /main_clock/type-signature 0 "t" "State" "Time" "Activator" ]

Next we examine the OSC message dispatcher, which is a Transform:

> [ /osc_server/dispatcher/type-signature ]
< [ /osc_server/dispatcher/type-signature 0 "N" "Transform" "N3osw11OSCDispatchE" ]

First, we note that a C++ class name symbol is visible. Second, we note that a Transform is actually a kind of container for States and Activations, therefore a Transform is non-readable address (see above) of Nil type.

Extra Types

In a few cases, we needed to extend the basic type system, but did not want to use the blob/string method above. We added the following, non-standard, types to our system:

A complex number: (C), as two floats packed together.

A vector of samples: (v), in floating point format. The correct vector length (e.g. 64) can be determined by queries against AudioOutput transforms or the AudioDriver objects.

> [ /patch/output0/mix/type-signature ]
< [ /patch/output0/mix/type-signature 0 "v" "State" "Samples" ]

In some cases the type of an address is a mutable type but has no simple mapping into the OSC type framework, and is not given an extended type tag either. A raw data transport framework is implemented whereby the OSC type is "bs" (blob, string) where the blob is expected to contain the raw binary (machine dependent) data, and the string the name of the type. For example, we have lookup tables for Sine, Sinc and Cosine functions:

> [ /sinc/type-signature ]
< [ /sinc/type-signature 0 "bs" "State" "FloatTable" ]

Current Value Queries

This is a rather simple matter of determining the current value of an address.

> [ /patch/output0/sample_rate/current-value ]
< [ /patch/output0/sample_rate/current-value 0 44100.0 ]

In some cases, a target will have a mutable type but no current value. For example, the input to an AudioOutput before audio is turned on. In this case the returned type will be Nil.

Non-readable (container) types (which have type-signature Nil) cannot give any sane response to a current-value query and is therefore return an error.

> [ /current-value ]
< [ /current-value -1 "Address does not have a readable type" ]

Set Value Commands

A set-value command is terminology which we have invented to describe normal use of OSC messages to transmit input to a target. This can be thought of as assignment, e.g.:

> [ /patch/sine0/freq 440.0 ]
< [ /patch/sine0/freq 0 ]

or as a function invocation (see also discussion of /advance-clock below), e.g.:

> [ /advance-clock ]
< [ /advance-clock 0 ]

The type-tags of the input must match the expected value (which is defined by the type-signature).

> [ /patch/sine0/freq "440.0 ]
< [ /patch/sine0/freq -1 "Cannot set incompatible data type" ]

Similar to current-value queries, set-value commands cannot be used on non-readable types, typically non-terminal containers:

> [ /patch ]
< [ /patch -1 "Target object is not an OSW state." ]

Interfacing with OSW

This section describes the basics of using the interface for productive tasks, primarily for building patches, manipulating transforms, loading transforms, etc. Using this information it should be possible to build a new GUI or other experimental client type for OSW via the OSC interface.

Builtin Objects and Global Commands

A number of objects are created at runtime in the OSW server. None of these objects can be removed. Renaming is possible but not recommended (see above). There are also a few addresses at the root level which give access to global commands.

/patch

The base patch for user constructed patches. The root of the namespace (/) is not addressable as an object because it is nameless, therefore it is necessary to create a singleton Patch object as a base for loading and running user programs.

/advance-clock

A global command which advances the clock by one frame. All TimeDomainTransforms will be triggered by this action, causing them to process a sample block (even if your AudioInput/Output transforms are off). This is provided mainly for debugging purposes, and secondly it may be used to slave systems together which are cooperating on a network.

/osw-quit and /osw-restart

Quit will force the server to exit. Restart will restart the server (not implemented yet?).

/osc_server

A patch containing the OSCCommunicator and OSCDispatch transforms necessary to engage in OSC communication. This is is actually another Patch at the root level, however the client should not interact with this area.

/load-osx

This global command will load a .osx file on the server. Presently you must specify the complete path to the file.

> [ /load-osx "/usr/local/osw/externals/cnmat/Resonators.osx" ]
< [ /load-osx 0 ]

/transform-types

Returns a list of all builtin and loaded transforms which can be created.

> [ /transform-types ]
< [ /transform-types 118 "!=" "%" ... "Abs" "Acos" ... "FanOut" ... "Gamma" "Gate" ... ]

Audio Drivers

There may be one or more AudioDrivers present, e.g. /alsa, /oss, etc. It is not usually necessary to interact with these.

Threads

It is possible to see how many threads are running on the server by doing a namespace query in /thread. The task threads are the ones which are capable of doing processing work in the OSW system. The watchdog thread is always present and is necessary to terminate the program in a scheduler-FIFO-lockup.

Miscellaneous

There are a number of root level objects which may be of interest to OSW's core C++ developers. They are, /casts, /implicit_time_manager, /scheduler, /sinc, /sine, /cosine, /exp, /overhead, /ui_granularity, /net_granularity, /is-auto-nameable (?).

Transform Methods

Every instance of a transform will have the following methods, in addition to containing zero or more States, Inlets and Outlets.

set-name

Renames the transform (alters the namespace).

get-inlets

Return a list of inlets for this transform. Inlets also have an activation order property which is an integer. Inlets with a lower activation order will trigger Activations earlier than those with higher numbers. Reply message is a return code indicating the number of inlets times two, followed by a list of (activation order, inlet name) pairs.

> [ /patch/output0/get-inlets ]
< [ /patch/output0/get-inlets 4 0 "/patch/output0/onoff" 1000 "/patch/output0/mix" ]

Here we see that onoff has activation order 0 and mix has activation order 1000.

get-outlets

Same as get-inlets. Outlets also have an activation order associated (?).

persistent-states

The OSW server does not save patches on its own; it is up to the client to save and reconstruct the patch for later use. Some Transforms have States which are so-called persistent. These are states that the client should remember and restore when the patch is reloaded. Currently this returns a string naming all of the persistent states. (this should probably be a list instead...)

> [ /osc_server/communicator/persistent-states ]
< [ /osc_server/communicator/persistent-states 0 "basePort" ]

The client is expected to use a current-value to get the value of the state prior to saving, and to use set-value to restore the state when restoring the patch.

Patch Methods

A Patch has methods to deal with creating and destroying Transforms which it contains.

add-transform

Adds a transform to the current patch. The third argument is optional and specifies the arguments to be passed in (always a string). This action will create a new address in the namespace immediately below the Patch.

> [ /patch/add-transform "AudioOutput" "aout" "1 2 3 4" ]
< [ /patch/add-transform 0 ]
> [ /patch/ ]
< [ /patch/ 12 "aout" "add-transform" "delete-transform" ... ]

delete-transform

Deletes a transform from the current patch, and removes its address.

> [ /patch/delete-transform "aout" ]
< [ /patch/delete-transform 0 ]
> [ /patch/ ]
< [ /patch/ 11 "add-transform" ... ]

State Methods

A State is a composite template object, usually a wrapper around a basic type such as a string, number, list, etc. Outlet and Inlet are also types of State.

get-coactivators

A State is may be connected to trigger an Activation (a special method of its Transform). In some cases, more than one state must change before an Activation is triggered. This set is the co-activation set. In most cases, there is only one member in the set. Return code is the number of members and is followed by the addresses of each member.

get-effects

When a State (or set of States) triggers an Activation which generates output to an Outlet, we say that State "effects" the Outlet. This returns the set of Outlets which the State may cause to change. Return code is the size of this set.

set-name

Changes the name of the state.

Miscellaneous

This is section contains some extra notes regarding the author's interpretation of OSC, OSW, missing features, implementation details and theoretical discussion. These viewpoints are presented as opinions only and are open to debate.

Extended Types

We should consider making the (v) type machine-independent floating point format.

We have a type which is a vector of 2-byte integer samples (3), but do not currently use it.

We need an unsigned integer type, or a way to let unsigned integer targets use an integer (perhaps by throwing an error if the input is negative). Currently this uses blob/string but is rather annoying.

The Boolean type is a little bit awkward. There is a type for True (T) and a type for False (F), but no type for Boolean (i.e. True -or- False). This means that boolean typed targets will have mutable type signatures, which is a little bit weird (however not entirely out of place since some OSW types are mutable also via automatic type casting). For example, when we look at the onoff switch of an AudioOutput when it is off:

> [ /patch/output0/onoff/type-signature ]
< [ /patch/output0/onoff/type-signature 0 "F" "State" "Boolean" "Activator" ]

However, when it is on, the type changes to "T".

Large Messages

The possibility has arisen that in some cases the size of a message may exceed the capacity for transmission in a single UDP packet (greater than 64kb). There may be several ways to work around the restriction. One suggestion is to create a special address #split which contains a fragment number, fragment count and a blob which contains the corresponding chunk of the binary encoded message.

> [ #split 0 3 blob:{ [ /message ... ] } ]
> [ #split 1 3 blob:{ ... } ]
> [ #split 2 3 blob:{ ... } ]
< [ /message 0 ]

Conclusion

We are very pleased with Open Sound Control's capability and scalability which enable it to encapsulate nearly all of our interface requirements. Our implementation is powerful and nearly complete, yet a few items remain to be done before we can claim full support of the OSC protocol, mainly we have yet to deal with time tags on input messages; this can be implemented with a sort of priority queue with timed release of future messages. Furthermore, there are a few rough edges in our implementation which lead to minor inconsistencies in the reply messages. We are looking forward to new possibilities which may open up as a result of the increased flexibility of a functional network interface.

Thanks to Amar Chaudhary for making the OSW architecture integrate so well with OSC, to Matt Wright, David Wessel, Adrian Freed and the CNMAT community for creating and supporting OSC, and to all the members of the OSC mailing list and greater community.

References

Open Sound World, software by Amar Chaudhary and others. http://osw.sourceforge.net/
Open Sound Control, Matt Wright, CNMAT. http://cnmat.berkeley.edu/OSC
Andrew W. Schmeder, personal website, download for osw-cli. http://www.a2hd.com/

About this document ...

Implementation of a Scalable and Dynamic Interface for Open Sound World using Open Sound Control

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.48)

The command line arguments were:
latex2html -subdir -split 2 -nonavigation -no_show_section_numbers -noimages -latex -notex_defs -html_version 4.0 -toc_depth 3 osc-doc.tex

andy at a2hd dot com 2003-05-27