tabletop_object_perception: active_realtime_segmentation | fast_plane_detection | object_recognition_gui | object_segmentation_gui | tabletop_collision_map_processing | tabletop_object_detector

Package Summary

Performs object segmentation and simple object recognition for constrained scenes.

Object Perception

There are two main components of object perception:

  • object segmentation: which part of the sensed data corresponds to the object we want to pick up?
  • object recognition: which object is it?

Our manipulation pipeline always requires segmentation. If recognition is performed and is successful, this further informs the grasp point selection mechanism. If not, we can still grasp the unknown object based only on perceived data.

Note that it is possible to avoid segmentation as well and select grasp points without knowing object boundaries. The current pipeline does not have this functionality, but it might gain it in the future.

To perform these tasks we rely the following assumptions:

  • the objects are resting on a table, which is the dominant plane in the scene
  • the minimum distance between two objects exceeds a given threshold (3cm in our demo)
  • in addition, object recognition can only handle 2 degrees of freedom: translation along the X and Y axes (with the Z axis assumed to be pointing "up" relative to the table). Therefore, in order to recognize an object:
    • it must be rotationally symmetric
    • it must have a known orientation, such as a glass or a bowl sitting "upright" on the table.

The sensor data that we use consists of a point cloud from the narrow stereo cameras. We perform the following steps:

  • the table is detected by finding the dominant plane in the point cloud using RANSAC;
  • points above the table are considered to belong to graspable objects. We use a clustering algorithm to identify individual objects. We refer to the points corresponding to an individual object as clusters.

  • for each cluster, a simple iterative fitting technique (a distant cousin of ICP) is used to see how well it corresponds to each mesh in our database of models. If a good fit is found, the database id of the model is returned along with the cluster.
    • note that our current fitting method operates in 2D, since based on our assumptions (object resting upright on the known surface of the table) the other 4 dimensions are fixed

The output from this components consists of the location of the table, the identified point clusters, and the corresponding database object id and fitting pose for those clusters found to be similar to database objects.

detect_1.png

Narrow Stereo image of a table and three objects

detect_2.png

Detection result: table plane has been detected (note the yellow contour). Objects have been segmented (note the different color point clouds superimposed on them). The bottle and the glass have also been recognized (as shown by the cyan mesh further superimposed on them)

ROS API

The tabletop object detector provides its results in the form of a TabletopDetectionResultMessage. It responds to requests using the TabletopDetection service.

Note the following:

  • database connection:
    • the detector needs a connection to the database of known models to perform recognition. This database is documented in household_objects_database

    • if no recognition is desired, then the database can be absent. The detector must be informed (through a parameter below) that no database is to be used.
  • returned information:
    • the detector can return any combination of
      • the result of the segmentation process - "raw" clusters
      • the result of the recognition process - recognized objects id
    • this information is returned based on the recognition request:
      • if recognition results are not requested, the recognition step is bypassed altogether
      • if raw cluster results are not requested, segmentation is still performed as it is needed for recognition, but the raw clusters are not copied in to the result message
      • if neither clusters nor recognized objects are requested, the Table is still computed and returned

The most relevant parameters (default value):

  • use_database (false): whether the database should be used. If no database is used, the detector will perform table detection and object segmentation, but no recognition.

    • if the database is in use, the detector will look for the following connection parameters on the parameter server:
      • /model_database/database_host

      • /model_database/database_port

      • /model_database/database_user

      • /model_database/database_pass

      • /model_database/database_name

  • quality_threshold (0.005): a unit-less threshold that is compared to the results of the model recognition to decide if a model recognition is good enough to be used. Lower numbers indicate a better fit. Reduce this if you want the recognition to be more strict.

  • clustering_distance (0.03): the min distance between objects, provided as input to the segmentation algorithm. Objects closer to each other than this threshold will be merged into the same cluster.

  • processing_frame (empty): if non-empty, any received point clouds are first transformed in this frame, before any processing. Useful for operating in a frame where intuitive filters can be applied (e.g. tables can usually be found between z=0.5m and z=1.5m if operating in a frame with the origin on the floor and z pointing up, such as the base_link frame of the PR2)

  • filtering parameters: all of these parameters are applied to decide what part of the point cloud the detector should focus on. They must make sense in the processing_frame. If processing_frame is unspecified, all processing takes place in the native frame of the incoming cloud, which on the PR2 robot is the camera frame. Therefore, all these parameters have default values which makes sense in the camera frame.

    • z_filter_min (0.4) and z_filter_max (1.25): incoming point cloud first filtered and only points between these values are kept

    • table_z_filter_min (0.01) and table_z_filter_max (0.5): after the table is detected, only points with height above the table between these two values are kept

    • up_direction (-1.0): whether the natural "up" direction (relative to gravity) is closer to positive or negative z

Running the Manipulation Pipeline

To launch the manipulation pipeline, complete with the sensor processing provided here, and execute pickup and place tasks using the PR2 robot, tutorials and launch files are provided in pr2_tabletop_manipulation_apps.

Wiki: tabletop_object_detector (last edited 2011-01-21 03:18:04 by TimField)