Computer Vision

Computer vision is an invaluable capability for autonomous surface vehicles (ASVs), or any autonomous vehicle for that matter. This is primarily due to the large amount of unique information contained in images. The advantage of a LIDAR/Video vision system is that, in addition to typical camera information, such as color, morphology, and the angular location of objects within the camera’s field of view (FOV), depth information is also available. Depth is extremely important for ASVs, as it enables the use of advanced vision algorithms like simultaneous localization and mapping (SLAM) and object based color analysis. The Maritime RobotX Challenge is uniquely difficult, requiring the ASV to autonomously complete a number of tasks including visual obstacle identification, obstacle avoidance and docking. This level of autonomy is achieved, in part by using the LIDAR/Video vision system mounted on a rotating gimbal. The LIDAR scanner (Hokuyo UTM-30LX) scans on a plane and is used to obtain distance information while the video camera (Logitech Pro 9000) is used to obtain color and morphological information about the objects in the vision system’s FOV. LIDAR systems capable of achieving a two dimensional depth map are quite expensive. An alternative used in this system is a lower cost, planar LIDAR system (that is, a LIDAR that collects range information from one plane in the scene being measured). The sensing can then be extended to 3D by adding secondary rotation to the sensor. The primary mechanical component of the vision system is a gimbal that enables LIDAR scanning. The LIDAR sensor only collects data on a single plane, thus necessitating this rotation. As shown, the LIDAR device is mounted on a rotating cradle. The cradle is mounted on a waterproof box that houses the forward- looking camera, as well as a stepper motor (NEMA-17 with a 14:1 gearbox) which actuates the gimbal through a belt. Additional components include a potentiometer which allows measurement of the gimbal angle and waterproof covers for the potentiometer and timing belt, which were produced using a 3D printer.

The stand-alone vision system is designed with the capability to control the gimbal, measure data, and perform image processing. The primary computational resource for this task is a Pandaboard single board computer running a Linux operating system. The Pandaboard is programmed using MathWorks’ Simulink and Real Time Workshop packages. Both the Hokuyo LIDAR device and the Logitech camera are connected directly to the Pandboard. The stepper motor used to control the gimbal is driven by a Phidgets 1067 control board and the gimbal angle is measured using a potentiometer read by an Arduino UNO. Both the stepper driver and the Arduino are connected to the Pandaboard.

Typically, gimbaled LIDAR systems utilize raster scan patterns, where LIDAR data is gathered one scan line at a time, with motion of the gimbal occurring between scans or at a relatively slow rate. This process can be very time intensive depending on the desired resolution and LIDAR FOV. With this type of scan pattern, as the ASV turns, objects in the LIDAR image can become distorted. This occurs because points in the depth image are not collected at exactly the same instant as when the LIDAR scans the environment. To overcome this issue, one of the innovations implemented in the presented gimbaled LIDAR system is the use of Lissajous-like scan patterns to allow a trade-off between speed and resolution without constraining the FOV. The LIDAR and video sensors produce outputs that must be fused: Depth information, with corresponding gimbal and LIDAR angles, enabling the production of a depth image. As an example, consider the problem of buoy identification. The LIDAR/Video fusion algorithm uses the depth image to identify objects of interest (not just buoys, but anything in the FOV of the LIDAR). It is advantageous to carry out the object identification in the depth image because floating objects are automatically isolated, both from the water (as the LIDAR is unable to return the distance to the water surface) and from the background (as LIDAR range is limited). In order to fuse the LIDAR and video images, the location of the POI needs to be transformed into RGB camera pixel coordinates. This is done by converting the LIDAR points to Cartesian XYZ coordinates, then translating the origin to that of the video camera, and finally calculating the pixel values through use of an intrinsic model of the video camera. This typically results in a sparse mapping of depth points to the pixel frame, as the camera has significantly higher resolution. In order to better correlate the object being detected in the depth image with its corresponding object in the RGB image, the depth information is thresholded for a desired range and the remaining points are joined to make a continuous area using morphological operations.