Projects/Building Manager/capabilities

Target developer: professional software developer, minimal background in robots
Target platform: Turtlebot, which an iRobot Create with dual-core CPU and Kinect

Design Overview

Bootstrap layer

A robot is assumed to always be running a minimal bootstrap layer. This bootstrap layer:

provides status information on the robot: battery level, connectivity, location (if available)
enumerates applications installed
enables launching/killing of applications
enables installation of applications
enables applications to access capabilities (defined below)

The robot may also be running processes related to capabilities, though this is considered opaque to external entities.

Capabilities layer

... aka 'Services', in OS speak. Capabilities are common APIs that a robot platform exposes, such as localization, navigation, sensor drivers, and more. These capabilities are expected to be well-defined, limited in number, and conform to standards or best practices within the community.

For example, on a smartphone, there are capabilities for accessing cameras, user input, user location, dialing the phone, and more. A smartphone has limited battery and computation resources, so these capabilities are not always on. Instead, an application must request these resources on demand. It is also the case, such as with cameras, that there may be multiple instantiations of a capability (e.g. front and back camera).

The target user is not expected to have much control over the implementation of the capabilities layer. Instead, the capabilities layer is provided by the platform provider, i.e. the manufacturer of the robot. It is up to the manufacturer to ensure the capabilities work well. The user is not expected to know or care about the underlying implementation, e.g. whether the location is provided by wifi or gps. More advanced users can 'jailbreak' a phone and to access this layer, but this general user does not.

These motivations carry over into our robot platform. Turtlebot is also battery and CPU constrained. It is also expected to support several, well-defined capabilities, like robot localization, sensor access, and base control. Like a smartphone, these capabilities need to be dynamically started and stopped to limit CPU usage.

Application layer

An application is software targetted at an end user, e.g. the owner of the robot. The set of potential applications is large and the installed software can run on the robot, on the Building Manager, and possibly also in the "cloud".

Applications can be downloaded by the user or by a supervisory application. For example, a user can go to an "app store" and select a program to install. As another example, a "Delivery" app that is installed on a central task manager can deploy a client app to a robot.

We hope for applications to be relatively small. If the capabilities layer is strong, then applications will mostly be orchestration and user interface. Smaller applications can be deployed more quickly and easily, and they are also an indicator that the platform has sufficient capabilities.

In order to ensure proper operation of an application, we assume a single-tasking model. It is not possible to test an application in all possible multi-tasking scenarios, and the consequences of bad scheduling are much worse on a physical robotic platform.

In the longer term, applications will be installed in a separate space from the rest of the robot software and will be subject to stricter permissions. We also wish to restrict access of an application to the robot resources, whether they be ROS topics or the filesystem. These restrictions will ensure applications properly declare the resources that they need and that the consumer has control over what is enabled.

Plugins

We are considering the potential to have well-defined plugins that applications can define. In the smartphone analogy, an application can install a hook into the camera application to send captured photos to Twitter or e-mail.

The need and design in the robotic scenario is more complex. Imagine that you wish to design an user interface that allows you to define 'tasks' to execute at different points on a map. The application would command the robot to these locations on the map, then hand over control of the robot to the 'task' to execute. When the task completes or is pre-empted, the main application resumes control.

In the pure application formalism, this is not possible to do in an extensible model. This is like iOS, where the camera application cannot share photos with other user-installed applications, and, similarly, user-installed applications must define their own photo capture UIs.

If we adopt the iOS model, then each application must design and develop its own set of tasks to execute. This model is simpler, but it results in duplication of user interfaces and underlying libraries. It is possible to draw comparisons to the single-tasking vs. multi-tasking issue. Allowing these plugin capabilities could become a backdoor for enabling multi-tasking, and can run into the same issues of testing and performance present there.

To again draw comparison to smartphones, multi-tasking, when present, is usually not "true" multitasking. For example, on Android, the OS may choose to kill any task at any time. Also, the APIs that a background task may call are more limited.

A plugin 'task' API could be similarly limited in order to meet our needs. For example, in the "Contiuous Ops" infrastructure, a task must be pre-emptable, and the robots arms must return to a set position when the task is completed/pre-empted. In the navigation task example given above, we could dictate that the task must be safe to pre-empt at any time, must complete within a set time limit, and so on.

iOS has made a similar approach for it's multitasking APIs. They surveyed common multi-tasking needs for phone applications. Based on this, they came up with an ontology of tasks and created a limited set of APIs for exposing multitasking in these cases.