Benchmarking

Over the last 20 years, robot competitions have emerged as a powerful means to foster progress in robotics Research and Development (R&D). By combining the scientific rigor of experiments with the real-world relevance and spectacle of competitive events, they are able to offer a highly complementary approach to traditional lab-based R&D. Central to furthering the scientific aspects of competitions is benchmarking - one of the cornerstones of the RoCKIn project.

Benchmarking means measuring performances using a methodology that is repeatable, reproducible, accurate and as objective as possible. In this way, different approaches to solving a particular problem can be compared according to objective criteria, both within a competition or project and outside of it. Benchmarking in RoCKIn has been developed to allow for the quality of the sub-system components to be recognized, through the Functionality Benchmarks, as well as to assess the performance of complete systems in the overall task, through the Task Benchmarks.

The datasets collected through the benchmarking activities can be downloaded here.

Task Benchmarks

For the Task Benchmarks, teams are evaluated on task-specific criteria that look at the success and quality of task execution. These criteria include the time needed to complete the task, quality of perception (e.g. the extent to which objects are correctly identified), quality of navigation (e.g. does the robot bump into obstacles when navigating the arena) and quality of manipulation (e.g. is the grip pressure calibrated so that the item is held firmly but isn't broken). How well teams perform on each of these criteria is determined objectively by comparing data captured directly from the robots - such as the commands registered, where the robot thinks it is in the arena throughout the task, the location of obstacles detected etc. - against what actually occurred (e.g. actual commands given) and/or ground truth data such as the actual robot position, precisely captured by a specially designed system.

Functionality Benchmarks

For the Functionality Benchmarks, teams are evaluated on functionality-specific criteria closely linked to individual capabilities of the robot. These criteria include the number and percentage of correctly identified objects, pose error for all correctly identified objects (i.e. errors in the estimated position or rotation of the object) and the word error rate (i.e. the percentage of words understood incorrectly from commands given to the robot) in speech understanding. The team's performance is again determined by data captured directly from the robots - such as detection, recognition and localization data associated with the objects - which is then compared against what actually occurred (e.g. actual commands given) and/or ground truth data.

About the RoCKIn Ground Truth System


The system is based on a commercial Motion Capture (MoCap) system composed ofa set of specialized infrared cameras plus a proprietary software package. Such system is capable ofaccurately tracking the movements of special markerswithin a volume of space. As this type of system is usually installed in controlled environments (controlled lighting, no physical obstructions etc.), its application to robot competitions requires special attention to be given to system design and installation. During the benchmarks, robots are fitted with RoCKIn “marker sets” which are tracked by the MoCap system. The tracking data is processed by RoCKIn to produce benchmark-related information. This setup provides very accurate ground truth pose data (i.e. the robot's true position), which allows teams to check their robot's own pose estimate against the ground truth, get detailed feedback on their performance and isolate areas that need to be improved.

Further information on the RoCKin Ground Truth System can be found on the wiki and in this deliverable.

Resources


Extended summaries of the specific performance criteria and benchmarking data being captured from each task and functionalities can be found here:
The full details can be found in the rulebooks:

Datasets

The datasets that have been collected during RoCKIn competitions and events have been made available to the robotics community here. This is to allow for further analysis and understanding about the task level and functional level performance of robotics systems.