Unsupervised and Self-mapping Category Formation and Semantic Object Recognition for Mobile Robot Vision Used in an Actual Environment

This paper presents an unsupervised learning-based object category formation and recognition method for mobile robot vision. Our method has the following features: detection of feature points and description of features using a scale-invariant feature transform (SIFT), selection of target feature points using one class support vector machines (OC-SVMs), generation of visual words using self-organizing maps (SOMs), formation of labels using adaptive resonance theory 2 (ART-2), and creation and classification of categories on a category map of counter propagation networks (CPNs) for visualizing spatial relations between categories. Classification results of dynamic images using time-series images obtained using two different-size robots and according to movements respectively demonstrate that our method can visualize spatial relations of categories while maintaining time-series characteristics. Moreover, we emphasize the effectiveness of our method for category formation of appearance changes of objects.


Introduction
Because of the advanced progress of computer technologies and machine learning algorithms, generic object recognition has been studied actively in the field of computer vision (Yanai, 2007).Generic object recognition is defined as a capability by which a computer can recognize objects or scenes to their general names in real images with no restrictions, i.e., recognition of category names from objects or scenes in images.In the study of robotics, one method to realize a robot having learning functions to adapt flexibly in various environments is to create a brain-like memory: so-called world image maps (Nakano, 1995).To create world image maps, robots must classify objects and scenes in time-series images into categories and memorize them as long-term memory (LTM).Additionally, in actual environments for a robot, the number of categories is mostly unknown.Moreover, the categories are not known uniformly.Therefore, a robot must classify while generating additional categories.
This paper presents unsupervised feature selection and category formation for application to robot vision.Our method has the following four capabilities.First, our method can localize target feature points using one class-support vector machines (OC-SVMs) (Scholkopf et al., 2001) without previous setting of boundary information.Second, our method can generate labels as a candidate of categories for input images while maintaining stability and plasticity together.Third, automatic labeling of category maps can be realized using labels created using adaptive resonance theory 2 (ART-2) as teaching signals for counter propagation networks (CPNs).Fourth, our method can present the diversity of appearance changes for visualizing spatial relations of each category on a two-dimensional map of CPNs.Through object classification experiments, we evaluate our method using time-series images taken by a camera on a mobile robot.

Related studies
The problem of simultaneous localization and mapping (SLAM) has attracted immense attention in mobile robotics studies (Dissanayake et al., 2000).The objective of SLAM is Published by Copernicus Publications.
to place a robot in an unknown location of an environment and to build a map incrementally while simultaneously using this map to compute its location.Cummins et al. (2008) proposed fast appearance based mapping (FAB-MAP) as a probabilistic approach to recognizing places based on their appearance.The objective of FAB-MAP is similar to SLAM: to build a map of routes using appearance changes of scene images obtained using a camera on a mobile robot.Our objective is to classify images obtained using a camera on a mobile robot into categories for recognizing objects.
Learning-based object classification methods are roughly divisible into supervised object classification methods and unsupervised object classification methods.Supervised object classification methods require training datasets including teaching signals extracted from ground-truth labels.However, unsupervised object classification methods require no teaching signals with which categories are automatically extracted to a problem of unknown classification categories for classifying images into respective categories.Recently, studies of unsupervised object classification methods have been ongoing (Sivic et al., 2005).The subject has attracted attention because it might provide technologies to classify visual information flexibly in various environments.
In recent studies of object classification, various methods have been proposed to combine the process of detecting regions or positions of an object as a target of classification and recognition.Barnard et al. (2003) proposed a wordimage translation model as a method based on regions.They automatically annotated segmentation images using images that were presented several keywords previously.Lampert et al. (2008) proposed an efficient subwindow search (ESS) that can quickly detect a position of an object using branch and bound methods and integration images.Using ESS, they realized first partial generic object detection to previously calculate output values of support vector machines (SVMs) in each feature point and to localize a search range gradually.Moreover, Suzuki et al. (2009) proposed a local feature selection method used in bag-of-features (BoF) with SVMs.This method classifies local features into background features and target features used for BoF.However, these methods require previously acquired training samples with teaching signals.Therefore, these methods are inapplicable to an actual environment for which a target region and a background region cannot be decided uniformly.
As unsupervised object classification methods, Sivic et al. (2005) proposed an unsupervised object classification method using pLSA and LDA, which are generative models from the statistical text literature.They modeled an image containing instances of several categories as a mixture of topics and attempted to discover topics as object categories from numerous images.Zhu et al. (2009) introduced probabilistic grammar Markov models (PGMMs) of generative models that combined probabilistic context-free grammars (PCFGs) and Markov random fields (MRFs).They used this method to create an object category model for object detection and unsupervised object classification.Moreover, they proposed probabilistic object models (POMs) that improved their method and enabled classification, segmentation, and recognition of objects (Chen et al., 2009).Todorovic et al. (2008) proposed an unsupervised identification method using optical, geometric, and topological characteristics of multiscale regions consisting of two-dimensional objects.They represented each image as a tree structure by division of multiscale images.Moreover, Nakamura et al. (2008) proposed an unsupervised object classification method using multimodal information of vision, hearing, and touch.They achieved object classification of objects that resemble human senses using embodied interactions of a robot.However, these methods include the restriction of prior settings of the number of classification categories.Therefore, these methods are applied only slightly to classification problems in an actual environment for which the number of categories is unknown.

Categories in an actual environment
Numerous categories exist in an actual environment.Humans can recognize several tens of thousands of categories (Biederman, 1986).We consider that it is possible for a robot to classify categories in an actual environment to specify them clearly.In this paper, we used a questionnaire investigation to find the number of classification targets of categories used for an actual environment.The target environment is our research room at the Neuroinformatics Laboratory, Akita Prefectural University.Figure 1 depicts photographs taken in the room.The floor space is about 90 square meters.Ten university students participated as subjects.They walked around the room a few minutes for observation.Subsequently, they wrote categories that they found and recognized as a categories on the questionnaire sheet.The questionnaire sheets consisted of two classification types: rough classification and fine classification.Table 1 presents results of the number of categories to be extracted with this investigation.In the rough classification, 11 categories were extracted, consisting of 4 minimum categories and 22 maximum categories.In the fine classification, 28 categories were extracted, consisting of 14 minimum categories and 44 maximum categories.Table 2 contains the categories from which more than two subjects were extracted.In the rough classification, chairs, desks, computers, etc., which are numerous in the research room, are extracted.Moreover, large objects such as a whiteboard and a  refrigerator, for which the number of the category is one instance in the room, are extracted.In the fine classification presented in Table 3, small items such as cups and umbrellas are extracted, although categories that are the same in the rough classification are extracted.Extracted objects such as PaPeRo (a communication robot produced by NEC), Mindstorms (a self-assembled robot by LEGO), and NetTansors (a web-camera embedded robot by Bandai) are extracted in each category that can be extracted to one category as a robot.

Whole architecture of our method
In generic object recognition, it is a challenging task to develop a unified model to address all steps from feature representation to creation of classifiers.The aim of our study is the realization of category formation for generic object recognition to apply theories with different characteristics for each step.Figure 2 depicts the network architecture of our method.The procedures are the following.
1. Extracting feature points and calculating descriptors using SIFT (scale-invariant feature transform) 2. Selecting SIFT features using OC-SVMs 3. Creating visual words of all SIFT descriptors and calculating histograms of selected SIFT descriptors matched with visual words using SOM 4. Generating labels using ART-2

Creating a category map using CPNs
Procedures (1) through (3), which correspond to preprocessing, are based on the representation of BoF.We apply OC-SVMs to select SIFT feature points for localizing target regions in an image.For producing visual words, we use SOMs, which can learn neighborhood regions while updating the cluster structure, although k means must choose data of the center of a cluster.Actually, SOMs can represent visual words that minimize misclassification (Terashima et al., 1996).Furthermore, the combination of ART-2 and CPNs enables unsupervised category formation that labels a large quantity of images in each category automatically.Table 4 shows parameters of OC-SVMs, ART-2, and CPNs based on our former study (Tsukada et al., 2010(Tsukada et al., , 2011;;Madokoro et al., 2011b).Herein, we compared our method (Tsukada et al., 2010) with the method proposed by Chen et al. ( 2009) using the Caltech-256 object category dataset (Griffin et al., 2007).
We obtained a result that the performance of our method was superior to their method, although the target dataset was aimed at generic object recognition.

Image representation
In fact, BoF (Csurka et al., 2004), which represents features for histograms of visual words with local features as typical patterns extracted from numerous images, is widely used to emphasize the effectiveness in image representation methods of generic object recognition.In BoF of our method, we applied OC-SVMs for selecting SIFT (Lowe, 2004) feature points as target regions in an image.Furthermore, we applied self-organizing maps (SOMs) for creating visual words and histograms in each image from selected features.
Our target is SIFT feature points on an object for recognition.Therefore, target regions and target feature points respectively mean object regions and feature points on an object.The OC-SVMs are unsupervised-learning-based binary classifiers that enable density estimation without estimating a density function.Therefore, OC-SVMs can apply to realworld images without boundary information.
For our method, we apply SOMs, not k means, which is generally used in BoF, for creating visual words.In the learning step, SOMs update weights while maintaining topological structures of input data.Actually, SOMs create neighborhood regions around the burst unit, which demands a response of the input data.Therefore, SOMs can classify various data whose distribution resembles the training data.In addition, Terashima et al. reported that SOMs are superior to k means as an unsupervised classification method that is useful to minimize misrecognition (Terashima et al., 1996).
The learning algorithm of SOMs (Kohonen, 1982) is the same as the algorithm used between the input-Kohonen layers of CPNs.In this method, we used all SIFT features for creating visual words at the learning step of SOMs.We used SIFT features selected by OC-SVMs for generating his- tograms based on visual words.Based on our preliminary experiment, we set the learning iteration to 100 000 times.Additionally, we set the number of units of the Kohonen layer to 100 units.We created visual words to extract weights between Kohonen layer units and input layer units.

Unsupervised category formation
Actually, ART-2 is a theoretical model of unsupervised neural networks of incremental learning that forms categories adaptively while maintaining stability and plasticity together.Features of time-series images from the mobile robot change with time.Using ART-2, our method enables an unsupervised category formation that requires no setting of the number of categories.
A type of supervised neural network, CPN, actualizes mapping and labeling together.Such networks comprise three layers: an input layer, a Kohonen layer, and a Grossberg layer.In addition, CPNs learn topological relations of input data for mapping weights between units of the input-Kohonen layers.The resultant category formations are represented as a category map on the Kohonen layer.Our method can reduce these labels using the winner-takes-all competition of CPNs.In addition, our method can visualize relations between categories on the category map of CPNs.Detailed algorithms of ART-2 and CPNs are the following.
In ART of various types (Grossberg, 1976), we use ART-2 that is possible to input continuous values (Carpenter et al., 1987).The learning algorithm of ART-2 is the following.
1. Top-down weights Z ji , bottom-up weights Z i j are initialized as (1) 2. Input data I i are presented to the F1; the sublayers are propagated as Pattern Recogn.Phys., 1, 63-74, 2013 www.pattern-recogn-phys.net/1/63/2013/ 3. Search for the maximum active unit T j as 4. Top-down weights Z ji and bottom-up weights Z i j are updated as 5. The vigilance threshold ρ is used to judge whether input data correctly belong to a category.
When Eq. ( 12) is true, the active units reset and return (2) to search again.Repeat (3) and (4) until the rate of change of F1 is sufficiently small if Eq. ( 12) is not true.In addition, a and b are coefficients of feedback loops from u to w and from q to v. Here, c is a propagation coefficient from p to r, and d is a learning rate coefficient.Furthermore, cd/(1 − d) ≤ 1 is the constraint between them, and θ is a parameter to control a noise detection level in v.
The CPNs (Nielsen, 1987) perform pattern mapping, i.e., CPNs map one pattern into another pattern in all sets of patterns.When a pattern is presented, learned networks classify patterns into specific categories using weights.Our method can automate labeling with generation of labels as teaching signals to the units of the Grossberg layer on CPNs.The CPN learning algorithm is the following.

u i
n,m (t) are weights from an input layer unit i(i = 1, ..., I) to a Kohonen layer unit (n, m)(n = 1, ..., N, m = 1, ..., M) at time t.Therein, v j n,m (t) are weights from a Grossberg layer unit j to a Kohonen layer unit (n, m) at time t.These weights are initialized randomly.The training data x i (t) show input layer units i at time t.The Euclidean distance d n,m separating x i (t) and u i n,m (t) is calculated as 2. The unit for which d n,m is smallest is defined as the winner unit c as 3. Here, N c (t) is a neighborhood region around the winner unit c.In addition, u i n,m (t) of N c (t) is updated using Kohonen's learning algorithm as 4. In addition, v j n,m (t) of N c (t) is updated using Grossberg's outstar learning algorithm as In that equation, t j (t) is the teaching signal to be supplied to the Grossberg layer.Furthermore, α(t) and β(t) are the learning rate coefficients that decrease with the progress of learning.The learning of CPNs repeats up to the learning iteration that was set previously.

Experimental results
We applied our method to object recognition experiments using time-series images taken by two robots: a small mobile robot for experiment A and an actual-size robot for experiment B.
5.1 Experiment A: specific object recognition using a small mobile robot

Generation of behavior programs
For this study, we used genetic programming (GP) by Koza (1992) for generating two behavior programs to run for routes A and B. Nodes used for GP were the following.
Terminal nodes cope with forward movement, 90 • turns to the left and to the right, and 15 • turns to the left and to the right.The non-terminal node runif is a condition judgment by which the first argument is executed if there is a landmark in front of the robot; the second argument is executed if no landmark exists.The non-terminal nodes progn2 and progn3 are functions that execute two arguments and three arguments sequentially.For the simulation, we used the map dividing the environment into 10 × 10 blocks.One block corresponds to 115 mm × 115 mm.The fitness value is increased when the robot finds a landmark and runs through it.We set the population size to 50 individuals and the generation to 100 steps.We used the best individuals as behavior programs.We respectively call behavior A and behavior B to be generated in routes A and B.
Figure 4 shows the assignment of objects in the environment and the roughly determined goals of routes for the robot.We generated behavior programs using GP.We set landmarks on both routes.Figure 5 portrays a generated tree and its simulation result of the simple route along with walls  shown in Fig. 4a. Figure 6 presents a generated tree and its simulation result of the route that acquires various appearances around each object shown in Fig. 4b.For this experiment, we created datasets consisting of time-series images in each behavior.Datasets comprise training datasets and testing datasets for which the robot runs two rounds in the environment.

Selection of feature points
Figure 7 depicts results of selected feature points using OC-SVMs on four samples of time-series images taken by the robot.Our method can select feature points near objects against various appearance changes.In images of object D, feature points of whole and a part of object D are, respectively, selected distant from the object and near the object.In addition, feature points are selected not only of the object, but also around the object.  of 220 frames.In addition, the labels are more numerous than the target objects because labels are assigned to each image taken by the robot turned 90 • from the four corners in the environment.Objects A, B, C, and D respectively generated 3, 2, 6, and 8 labels.Figure 9 depicts a category map generated by CPNs.On the category map, we show mapping regions of images in each object.Each object classified with different labels with ART-2 is mapped to neighborhood units on the category map of CPNs shown in Fig. 9.In addition, images of turning of labels 3 and 4 are mapped around border units between categories.

Classification results
We annotated images including defective objects of more than 30 % as being of the category of backgrounds and "other".Table 6 presents the recognition accuracy in each dataset for training and testing.The target datasets consist of A-1 and A-2 for the first and second rounds with behavior A, and B-1 and B-2 for the first and second rounds with behavior B. This experiment evaluated recognition accuracies for all combinations of four datasets for learning and testing.The respective recognition accuracies for training datasets A-1, A-2, B-1, and B-2 are 99.1, 98.8, 90.8, and 96.8 %.In behavior A, the respective recognition accuracies for testing A-2 and A-1 after learning A-1 and A-2 are 98.8 and 93.5 %.In addition, the respective recognition accuracies for testing B-1 and B-2 after learning A-1 and A-2 are 63.5, 64.3, 51.5, and 50.4 %.In behavior B, the respective recognition accuracies for testing B-2 and B-1 after learning B-1 and B-2 are 86.8 and 87.2 %.In addition, the respective recognition accuracies for testing A-1 and A-2 after learning B-1 and B-2 are 83.8,77.1, 94.0, and 95.8 %.The respective mean recognition accuracies for testing datasets for behavior A and for behavior B www.pattern-recogn-phys.net/1/63/2013/ Pattern Recogn.Phys., 1, 63-74, 2013  are 70.3 and 87.5 %.This result means that behavior B is superior to behavior A for learning.

Experiment B: generic object recognition using an actual-size mobile robot
Based on the results of the questionnaire presented in Table 1, we evaluated our method as generic object recognition in an actual environment using an actual-size mobile robot.We used PaPeRo developed by NEC.This robot is a prototype for a personal robot used especially for child-care purposes (Fujita, 2000).Table 7 presents specifications of this robot related to its use for this experiment.The robot is 385 mm high, 282 mm long, and 251 mm wide.A comparison with NetTansor shows that PaPeRo has sufficient capabilities to move on the floor.Moreover, servomotors are equipped for the drive system to control movements with high precision.
We used one camera for monocular vision, but two cameras are mounted for stereo vision.The specifications of cameras are the following: imaging device, CCD; image format, JPEG; resolution, 320 × 240 pixels; and frame rate, 30 fps. Figure 10 depicts the experimental environment.This room is a vacant room used as a professor's room.It contains a desk, a table, a sofa, and a cabinet.The floor is carpeted.
In the room are a window and a blind.We closed the blind to avoid effects of sunlight while taking images through the experiment.
We selected target objects that can be portably moved, they were neither too large or too small compared with this robot, from the top group of the number of extracted categories by the questionnaire investigation presented in Table 1. Figure 11 depicts target objects of four categories: personal computers (PCs), chairs, robots, and trash bins (TBs).We selected medium-size desktop PCs to be placed under the desk.We used only office automation chairs (OA), although chairs of numerous types exist there.Comparison with other objects shows that robots are the smallest targets for this experiment.We selected TBs that have no patterns or labels on the surface.We used different objects in same category for testing.
The robot moves the environment one round clockwise to use the behavior set consisting of forward movements and 90 • turns.In order to learn, each object in the same category is assigned on the routes shown in Fig. 4.After one round, the robot movement is suspended to take images.Subsequently, we changed objects to the next category; the robot resumed movement to take images.For testing, we assigned four different objects in each corner.The robot moved using the same behavior set.We created four datasets to change the positions of objects clockwise.
Figure 11 shows feature selection results of images with OC-SVMs.In this experiment, the range of moving for the robot is wide and the sizes of the target objects are various.Therefore, background feature points are selected.Moreover, classification target objects for the robots are smaller than those of other objects.Feature points including background regions were extracted because the occupancy of background regions is larger than that of other images.The PC and TB feature points are few because shapes and components of these objects are simple.Therefore, feature points include background regions that were extracted because the occupancy of background regions is larger than those of other images.
Pattern Recogn.Phys., 1, [63][64][65][66][67][68][69][70][71][72][73][74]2013 www.pattern-recogn-phys.net/1/63/2013/     Figure 12 portrays labels generated by ART-2.For this experiment we set ρ to 0.5 to prevent redundant categories.Results show that ART-2 generated 38 labels from 320 frames of input images.We consider that the reason for generation of numerous labels is the diversity of appearance of objects, although we set a small value of ρ.Moreover, images of the robot turning are included in training datasets.In the last part of input frames, overlapping labels are apparent.In fact, ART-2 additionally generated categories from images to the changed objects.In this environment, four patterns of background regions are repeated.We consider that overlapping is caused by these background patterns to be memorized.
Figure 13 portrays the category map created by the labels.The category map size is 20×20 units.Categories are created for each independent region.However, these categories are separated into several regions.Using CPNs, 38 labels generated by ART-2 were integrated to 29 labels.
Table 8 presents test results for datasets 1, 2, 3, and 4. Each dataset comprises 180 frames.The highest recognition accuracy is 53.3 % in robots.In contrast, the lowest recognition accuracy is 21.9 % in PCs.The recognition accuracy is decreased in datasets 2 and 3. Especially, the recognition accuracy of PCs is 0 % in dataset 2 and 4.4 % in dataset 3.       We selected target objects that can move portably.They were neither too large or too small compared with this robot, the top group of the number of extracted categories by the questioner investigation presented in Table 1. Figure 11 depicts target objects of four categories: Personal Computers (PCs), Chairs, Robots, and Trash Bins (TBs).We selected medium-size desktop PCs to be placed under the desk.We used only OA chairs, although chairs of numerous types exist there.Comparison with other objects shows that robots are the smallest targets for this experiment.We selected TBs that the same route from the start point under similar patterns of backgrounds, although objects were replaced in the test datasets.In this environment, the complexity of backgrounds at the routes of the forward movement after the start and the forward movement after the two sets of 90 • turns is higher than that of the other two routes.In the latter route of complex backgrounds, images include the door near the entrance.Our method selected these SIFT features in the background region.Results for test datasets show that the same units on the category map are burst.This false recognition occurs in cases where the distance between the robot and objects is great.We consider that these burst patterns occur in response to patterns of background regions.
Our method selected foreground regions in an unsupervised manner using OC-SVMs.However, false recognition occurred in cases with small objects shown in an image with a background of high complexity.We consider that restriction of the distance between the robot and objects is  robot movement is suspended to take images.Subsequently, we changed objects to the next category; the robot resumed movement to take images.For testing, we assigned four different objects in each corner.The robot moved using the same behavior set.We took four datasets to change the positions of objects clockwise.
Figure 11 shows feature selection results of images with OC-SVMs.In this experiment, the range of moving for the part of input ART-2 gener changed obje ground regio caused by the Figure 13 The category ated for each are separated generated by Table 8 pr Each dataset accuracy is 5 tion accuracy decreased in curacy of PC method failed curacy is dec the same rou of backgroun datasets.In th at the routes forward mov than that of t plex backgro Our method region.Resu the category cases where great.We con to patterns of Our metho vised manne occurred in c a background tion of the d sary instead recognition.necessary instead of using all frames for a target of training and recognition.The objective of our method is to recognize one object in a scene image.We must extend our method to the target to classify multiple objects in one image.

Conclusions
This paper presented an unsupervised method of SIFT feature points selection using OC-SVMs and category formation combined with incremental learning of ART-2 and selfmapping characteristic of CPNs.Our method enables feature representation that contributes to improved accuracy of classification for selecting feature points to concentrate characterized information of an image.Moreover, our method can visualize spatial relations of labels and integrate redundant and similar labels generated by ART-2 as a category map using self-mapping characteristics and neighborhood learning of CPNs.Therefore, our method can represent diverse categories.
Future studies must be conducted to develop methods to extract boundaries among clusters automatically and to determine a suitable number of categories from category maps of CPNs.Additionally, we will examine approaches that include generation of robot behavior for classification and recognition of objects.
Edited by: S.-A.Ouadfeul Reviewed by: two anonymous referees

Figure 1 .
Figure 1.Photos of the target environment for the questionnaire investigation.

Figure 2 .
Figure 2. Whole architecture of our method.

Figure 3
Figure 3 portrays a home robot (NetTansor; Bandai Co. Ltd.) used in this experiment.Table 5 presents specifications of the robot.The robot is 190 mm high, 160 mm long, and 160 mm wide.The camera specifications are the following: imaging device, 1/4 inch CMOS; image format, JPEG; resolution, 320 × 240 pixels; and frame rate, 15 fps (frames per second).The moving environment is 1150 × 1150 mm surrounded by 300 mm high white walls.

Figure 4 .
Figure 4. Experimental environment and robot routes.

Figure 5 .
Figure 5. Generated tree and simulation result of behavior A.

Figure 8 Figure 6 .
Figure8depicts labels generated by ART-2 on the experiment using time-series images of RUN1.The vertical and horizontal axes respectively represent labels of ART-2 and frames in images.The top parts portray ranges including objects and parts of the robot turned 90 • as time-series images.In this result, 27 labels are generated from time-series images

Figure 7 .
Figure 7. Results of selected SIFT feature points of time-series images.

Figure 8 .
Figure 8. Results of labels created using ART-2 from time-series images.

Figure 9 .
Figure 9. Mapping result of images on the category map of CPNs used in labels generated by ART-2.

Figure 9 .
Figure 9. Mapping result of images on the category map of CPNs used in labels generated by ART-2.

Figure 10 .
Figure 10.Experimental environment and an actual-size mobile robot for generic object recognition.

FigureFigure 10 .
Figure 11.Selecte Figure13portrays the category map created by the labels.The category map size is 20×20 units.Categories are created for each independent region.However, these categories are separated into several regions.Using CPNs, 38 labels generated by ART-2 were integrated to 29 labels.Table8presents test results for datasets 1, 2, 3, and 4. Each dataset comprises 180 frames.The highest recognition accuracy is 53.3 % in robots.In contrast, the lowest recognition accuracy is 21.9 % in PCs.The recognition accuracy is decreased in datasets 2 and 3. Especially, the recognition accuracy of PCs is 0 % in dataset 2 and 4.4 % in dataset 3. Our method failed to recognize PCs and TBs.The recognition accuracy is decreased by this false recognition.The robot ran

Figure 13 .
Figure 13.Mapping result of objects on the category map.

Figure 13 .
Figure 13.Mapping result of objects on the category map.

Table 1 .
Results of questionnaires administered to 10 subjects.

Table 2 .
Categories from which more than two subjects were extracted as a rough classification from the questionnaire investigation.

Table 3 .
Categories from which more than two subjects were extracted as fine classification from the questionnaire investigation.

Table 4 .
Setting values of parameters used in experiments.

Table 6 .
Recognition accuracy in each behavior [%].Bold numbers show the maximum accuracy in each training dataset.

Table 7 .
Specifications of PaPeRo by NEC.

Table 7 .
Specifications of PaPeRo by NEC

Table 1
, ion in an Madokoro et al.: