© Сергей Яковлев, 2004

Использование этой информации в коммерческих целях запрещенно. Но Вы можете копировать и перерабатывать данную статью в научных и образовательных целях, с последующим предоставлением результата на тех же правах. Для подробной информации см. Авторское право.

The identification system of dynamic objects on the basis of artificial neural networks[]

S. Jakovlev, Mg.sc.comp. e-mail: : sergeyk@fis.lv, tac@inbox.lv

1. Introduction[]

First of all this paper considers the sight model – Rosenblatt’s perceptron [1] with some modifications [2]. On the other hand, it takes into consideration the memory model – Cornel Classification CC4 [3] that supplements the sight model and shows the features of intelligence that are not considered in Rosenblatt’s perceptron.

The СС4 approach initiated Cornel Classification in Model CCM [4] that stressed the process acceleration. This paper compares the mentioned models and their options (CCM2, CCM3). Nevertheless the most important point is the possibility of CC4 and Rosenblatt’s perceptron combination that became real as a result of the author’s research. The author offers the perception model named “neuron”. Using it he has restored and analyzed the image during the object movement.

The usage of ССМ and the combination of СС4 with Rosenblatt’s perceptron are based on the following principles established in this paper: 1. The activity computation of the middle layer in the neuron net by the mathematic method without sorting all elements. 2. The usage of supportive αγ – system, which does not let the proportions being tuned to increase endlessly, that accelerates the convergence in many times along instruction; 3. The neuron net separation into blocks, which are responsible for the certain fields. Such separation considerably decreases the recognition time in comparison with the non-separated neuron net. 4. Almost two-times decrease in the amount of hidden layer elements (A-elements) in the neuron net in comparison with necessary minimum (one A-element per each stimulus). The above-mentioned principles let considerably raise the working speed of the neuron net without any considerable loss of quality and refusal to recognize.

2. The comparison of different method versions based on the CCM method[]

The main feature of the classic recognition methods, such as perceptron, as well as CC4, is their low working speed. That is why the dominant task of the author was to elaborate the methods that could work at a high speed and at the same time would not lose the features of the classic recognition methods. The author has elaborated several different methods (CCM2, CCM2+, CCM3) based on the CCM method. The CCM method in its turn is based on the principle of the angle classification. Table 1 shows working features of these methods described in comparison with the perceptron and angle classification methods. One should note that the recognition quality is the mistake amount occurred in the neuron net in comparison with the ideal image.

Table 1. The features of the methods elaborated Method Speed after training Quality (the quantity of mistakes) Memory Training The probability of refusal СС4 - - N – standards* 1 iteration 0% Perceptron - - N – standards* N iterations 0% ССМ/ССМ+ ↑ 2 times ↑10% N – standards* 1 iteration 0% ССМ2/ССМ2+ ↑ 2 times ↑ 15%-25% N – standards* 1 iteration 0% ССМ3 - ↑ 55% N – standards* 1 iteration 0% Perceptron + ССМ ↑ 7 times ↑ 10% N - standards N iterations 5% “Neuron” model [i] ↑ 7 times ↑ 10% 60% * N - standards <2000*i 5%

the growth in the quantity of the standards is not proportional to the growth of the quality; ↑ - means the increase

The above results were received according to the experiments when solving the problem of image restoration. One should note that CCM3 method gives the best quality at the lowest speed, and the combination of the methods “perceptron+CCM” provides the high speed without the decrease in quality. Thus, this was defined as a basis of the “Neuron” perception model.

There is the advantage and disadvantage list below (conventionally denoted as “+” or “–“), showing the pros and cons stated using these methods. The method development should be understood as the considerable changes in the algorithm [A] that cause the arising of another modified algorithm [B]. Such a modification is conventionally denoted below as [A] → [B].

1. [Perceptron] → [СС4]

+ 1. The time of training decreases to one iteration.

- 1. The quantity of A-elements reaches maximum and is not generating to the smaller amount within the process of training;

2. The increase in the quantity of standards does not provide any proportional growth of recognition quality.

2. [СС4] → [ССМ]

+ 1. The time of computer modeling decreases in 2 times;

2. Allows the management of the (AND/OR) proportion among the input parameters.

- 1. The generalization model does not depend on the amount of the input parameters. The model is built using one parameter and there is the logical operator (AND/OR) established among the parameters.

3. [ССМ] → [ССМ+]

+ 1. Trained stimuli are not forgotten (+10% quality);

- 1. The time of the computer modeling increases up to two times;

4. [Perceptron] + [ССМ+]

+ 1. Generalizing the activity level of A-elements (r=2) lets one use perceptron training methods to decrease the amount of A-elements;

- 1. The time of the computer modeling increases for training;

2. The quality of recognition is slightly worse.

5. [ССМ/(ССМ+)] → [ССМ2]

+ 1. The quality of recognition is improved at least by 15%;

- 1. The scheme [Perceptron] + [ССМ2] is impossible, since the insufficient activity level of A-elements does not allow using the training algorithms;

2. The two first models (from the first level) of generalization are only used in order to simplify the training algorithm. These models give the best generalization results, but other models are not taken into account.

6. [ССМ2] → [ССМ2+]

+ 1. The quality is improved by 10%.

- 1. Decrease in speed because of secondary usage of the algorithm. The model acknowledges standards after generalization (r=1).

7. [ССМ2/(ССМ2)] → [ССМ3] (not random choice of standards)

+ 1. The increase of the recognition quality by 30-50% evenly through all the image;

- 1. The image analysis for the choice of standards. It is difficult to describe standards by one coefficient of the random numbers for their generation.

3. The bases of the system, which restores and analyses several moving objects[]

3.1.The task definition[]

The Modeling situation: A man watches an object movement (at least three static images). The man distinguishes the given object on the area background because of the object movement. This object remains in the man’s memory partially owing to the volume limits of human memory. Thus the man watches and remembers several objects. In addition, objects can be repeated on the different backgrounds. After that the image is shown to the man and he has to recreate the object as precise as possible in his visual memory and he has to identify the object.

In order to simplify solving the given problem, one could split it into the following 6 technical tasks:

the selection of the object image within the general background according to the 3 object movement options in the static background.
the selection of the random consequence and memorizing of the random points (specified by given random consequence) in the neuron net which models the memory. The point quantity is defined by the available memory size, but still the preferable quantity is not less than 10% of the total point quantity.
the selection of the exact points (e.g. the points are located in lines making up the net on the image) and memorizing the color of the given points in the neuron net sin order to get the reaction to this stimulation, the reaction represents the random consequence, selected in item 2.
According to the demonstrated images, precisely, to the color of the points on the image (e.g. the points are located in lines making up the net on the image); one should define whether this is the first or the second object.
According to the random consequence defined in item 4, one should recreate the image memorized in the neuron net.
Repeat all for the 2nd object starting with item 2.

Let us consider the given problems in detail. The first problem is being solved with the determination algorithm – movement detector (see Section 3.3). Problems 2 and 5, as well as problems 3 and 4, are the problems of object recreation and analysis, and the description of these problems is considered in the further sections.

Problem 6 would not be that complicated if the knowledge, stored in the neuron net, refered exclusively to the first and the second object. But as there is knowledge of the first and the second object in the same neuron net - some specifications should be made regarding the division of the neuron net in columns. Let us accept that the column of the memory is the set of the proportions between the input and the hidden layer, the hidden and the output layer after teaching one object to the neuron net. Only one (active) memory column can be used at a time. That is why before recreating the image in the memory; one should define the memory column where the basic information recorded. Otherwise the considerable hindrances will arise and as a reaction to any object the mixed image of all the memorized objects is going to be recreated. In problem 4 after one gets the random consequence, he should check the activity of the A-elements (neurons) at that moment. The active memory column is the one that had the highest activity.

3.2. Basic principles of the elaborated perception model[]

From the structural point of view the perception model described in this paper is the whole artificial neuron net, but in order to simplify it is called “neuron”, and it is the neuron net in the common sense, meant for solving the given part of the objective task.

The complexity of the problem to be solved by a certain “neutron” is basically defined by the quantity of A-elements, the elements to be memorized. After performing the set of the experiments, the author has defined that if the element quantity is 2000, the “neuron” provides effective results. If the quantity exceeds 2000, the more complicated task is being solved, although the speed falls exponentially.

The basic principles of “neuron” operation are the same as in the CCM method, but besides them the perceptron training is applied together with the - the support system for the proportions being tuned. This allows using the “neuron” in three modes:

The precise comparison of the new objects to the objects memorized;
Inaccurate comparison (defined by M matrix) of the new objects to the objects memorized;
Inaccurate comparison (using information shrinkage) generalizing the data.

The first two modes are supported by the CCM method, and the third mode is supported using the perceptron training. The modes are being regulated by the generalization coefficient . If k=1, the activity level of A-elements provides the situation that only the A-elements absolutely overlapping the object compared are activated. If k=2, the A-elements close in value to the object compared get activated. If k=3, the activity level of A-elements is practically random, this lets the perceptron create a certain non-contradictional G-matrix. This, in its turn, allows using the perceptron training.

The training allows decreasing the quantity of A-elements. Theoretically, in order to enable the convergence of the process of training, one should dispose of the same minimal quantity of A-elements as the quantity of the objects to be memorized. Otherwise the convergence process cannot be guaranteed. Practically it is possible to decrease the quantity of A-elements to 80%, then in 95% cases the training process can be converged. Thus, the “neuron” has 1000 of A-elements and it can memorize 1800 objects, as well it has three low-level comparison functions.

3.3. Operation principle of the movement detector algorithm used in the model[]

The above-mentioned first technical problem is being solved using the determination algorithm. There are 3 static images sized 640x480 with 256 color-set are forwarded to the input (see Fig.1). The images are received from the camcorder.

Figure 1. “Image recognition in the dynamic consequence” system architecture. I- “Movement detector” algorithm, II – the recreation problem, III – the problem of analysis, 16-“Neuron” – 16 perception models (“Neuron” type) “Movement detector” algorithm finds the image of the moving object and forwards it to the input of the system. As can be seen from the picture, the object image might be partially distorted because of the hindrances in the static images (e.g. the shadows of the object arising because of the movement). It is possible to reach a better picture increasing the quantity of the static images being analyzed, i.e. increasing the time of object observation.

3.4. Recreation problem[]

Problems 2 and 5 are the problems of neuron net training and examination, when the random consequence of the point coordinates are forwarded to the input, and the recreated object image is provided in the output. Graphically, this is described in Figure 2.

Figure 2. The graphical representation of the recreation problem

In our case modeled, the visibility area of a man is 256х256 points. Every “neuron” perception model controls the area of 64x64 points in order cover the whole visibility area. In order to solve this problem the “neurons” are working in the mode of inaccurate comparison shrinking the information.

Figure 3 shows 16 curves representing the convergence of 16 “neurons”. One should note, that αγ – the support system advised by the author [2], allows shortening the training process in 2 times at minimum, and 5 times maximum, in comparison with α – support system. Besides, the convergence can be represented as the exponent curve. This conformity stays true till the “neuron” has available memory (A-elements). When the memory is full, the “neuron” indicates the error less than one thousand (the amount of A-elements). After that the process of memorization gets complicated and is based on the tuning of proportion coefficients, which provides the absence of contradictions among the objects being memorized. As a contradiction arises, the net has to get additional training – as one can see on the convergence graph, when the steep deviations arise on the exponent curve.

Figure 3. Exponential type of convergence training the all ”neurons”. E – the number of errors, i – the number of integration.

Differences in the process of 16 neurons convergence appear, because of different saturation and the complexity of the corresponding part in the image. Figure 1 shows the block of 16 -“neurons”. The „neuron” convergence is pictured correspondingly to the part of image, which they control. Looking at these pictures you can see the object limits represented by the curves of quick convergence. It means, that given fields are simpler than fields inside the object (witin the length of convergence). The field complexity depends on how evenly colors are allocated in the image. There is dominantly white color on the edge of the picture, which is ignored by „neuron”and only less than 1000 points are memorized. It means that intensive training is not taking place, consequently the convergence has exponential type on the all length.

Thus, the convergence characteristics help to evaluate the complexity of the corresponding object fields and its contours. That can be used to find the same objects in solving the analysis task. Figures 4-7 demonstrate the results obtained after the examination test. Hence the task of the image recreation is solved. In addition, the speed of recreation is ≈30 sec., but the quality is satisfactory, in other words it is not worse, than in using other classic methods of recognition.

Moreover, the images of one object situated on the different backgrounds look the same after recreation. Thus, algorithm „Movement detector” excludes the large part of background, but the process of memorizing-recreation “blurs” the insignificant details of the background and object (that is individual distinction of the object).

This lets us evaluate the object as representative of whole class, therefore only main characteristics describing typical class representative are memorized. The analogue is the creation of portrait not photo. The photo shows man as he is. Completely different thing is a portrait made by the talented artist. Therefore, such characteristic as the number of recognition errors is not useful, since it shows the difference between the recreated image and the ideal photo image. Because we need first of all an image corresponding to the most typical object that is the image containing main parts of the object.

Figure 4. Result of experiment No 1

Figure 5. Result of experiment No 2

Figure 6. Result of experiment No 3

Figure 7. Result of experiment No 4

3.5. The task of analysis[]

Tasks 3 и 4 solve the problems of the neuron net training and its examination. In this case the input is the image colour net, but the output is the number of object (what object) and the corresponding random sequence of the points, that is random coefficient (RND) (size 32 Bits). This is represented graphically in Figure 8.

Figure 8. Graphic representation of the analysis task

Combining tasks 2, 3, 4, and 5 graphically, we will get a system – Figure 9.

Figure 9. Graphic representation of the recreation and analysis tasks

The input consists of color value and the given coordinate points (x,y). The neuron net would have only information about the image color range without the point coordinates. Such an information would be enough for the object recognition if there were few objects in the memory. The quality of object recognition also depends on the quantity of points given from the image. The maximum net size is 16x16 for the field 64x64 (controlled b one neuron), that is each fourth point.

The neuron mode with precise comparison of new objects with memorized ones is used for solving this task. Along training the corresponding net of the object is correlated with the object code. This code will be used as input value for the recreation task.

There is shown the difference between several nets for objects in Figures 10-12. The probability of an object correlation to the memorized one is calculated according to how many neurons from 16 recognized an object. Every neuron takes into account all the coincidences comparing its certain field. Figure 10 shows that if the net consists of 4x4 lines, the system of 16 neurons recognizes the image the same as the first one with probability ≈50% in comparison with others, the same as the third one with probability ≈30%, but the same as the second and fourth ones with probability less than 5%. It means that the first and the third images are similar. They in fact were similar in this experiment. It was the same rabbit on the different backgrounds on the both images. The same results can be observed for the second and the fourth images, which contained the identical object „monkey” on the different backgrounds.

Figure 10. Recognition of the given image as the memorized image by „neuron” (Color net 4x4)

Figure 11. Recognition of the given image the memorized image by „neuron” (Color net 8x8)

Figure 12. Recognition of the given image as the memorized image by „neuron” (Color net 16x16)

In Figures 11-12 you can see, that while the net is increasing till size 16x16, the confidence of the system raises too. In addition it is necessary to emphasize that the amount of memorized objects influences the ability to recognize images too. The more images are memorized, the less recognition ability is observed.

4. Conclusion[]

Summarizing the above-mentioned, the author can state that the high-speed of image recognition was achieved without the considerable harm to the quality. The offered method allows searching and specifying the moving objects in the background. The basics of the system performing the low-level moving object identification operations are laid. This allows further more complex system projecting, where the offered system is going to be the basic one for solving elementary operations recreating and analyzing the objects being observed by video-devices.

5. References[]

Розенблатт Ф. (1965). Принципы нейродинамики (Перцептроны и теория механизмов мозга), Мир, Москва.
Jakovlev S. (2000). Efficiency analysis of a perceptron model in multiparameter classification tasks. Scientific proceedings of Riga Technical University, Issue 5, Vol.2, RTU, Riga, P. 93-99.
Subhash C. Kak. (1998). On generalization by neural networks, Information Sciences, 111, P.293-302.
Яковлев С. (2002). Способ математического расчета уровня активации нейронной сети – ССМ. Proceedings of the International Conference “Traditions and Innovations in Sustainable Development of Society. Information Technologies”, Rezekne, February 28 – March 2, P. 187-194.

S. Jakovļevs. Kūstamo objektu atpazīšanas sistēma pamatojoties uz māksligiem neironu tīkliem

Izstrādātas metodes ar lielāko darba ātrumu salīdzinājumā ar „perceptrons” un „leņķa klasifikācija” metodēm. Realizēti tēlu atpazīšanas sistēmas pamati dinamiskaja attēlu secībā. Šīm nolūkam veikta eksperimentu virkne, tiek izvirzīta un eksperimentāli apstiprināta hipotēžu virkne, kuru sekas tiek pielietotas realizētas sistēmas arhitekturā. Raksts paredzēts tiem, kurš interesējās par mākslīga intelekta pielietojumu kā tehnikā (videonovērošana, identificēšana), tā arī cilvēka centrala nervu darbības pētijumā.

S. Jakovlev. The identification system of dynamic objects on the basis of artificial neural networks

The methods using a more rapid processing in comparison with the classical methods of “perceptron” and “angle classification” are elaborated. The basics of image recognition system in dynamic image row are implemented. For this purpose several experiments have been performed, some hypotheses have been introduced and confirmed in practice. The results of these experiments and confirmed hypotheses can be applied in the architecture of the accomplished system. The paper will be found useful by everybody who is interested in the application of the artificial intelligence both technically (video detection, identification), and in the exploration of higher neurotic activity of a human.

С. Яковлев. Система распознования движущихся объектов на базе искусственных нейронных сетей

Разработаны методы с более высокой скоростью работы по сравнению с методами «перцептрон» и «угловая классификация». Реализованы основы системы распознавания образов в динамической последовательности изображений. Для этого проведены серии экспериментов, выдвинут и экспериментально подтвержден ряд гипотез, следствия которых применены в архитектуре реализованной системы. Статья предназначена тем, кто интересуется применением искусственного интеллекта как в технике (видео слежение, идентификация объектов), так и в исследовании высшей нервной деятельности человека.

Сергей Яковлев:Статья:RecognitionDynamicEng

Содержание