The AD–AHP planner: Incremental path planning for robots incorporating decision theory

Path planning for mobile robots in dynamic environments with many objectives to regard at once is challenging and often impossible. However recent outdoor robotic applications encounter many situations when such an approach is needed. This paper presents a path planning method for mobile robots that incorporates decision theory to guide the search. The method is able to handle an arbitrary number of objectives at the same time and also enables the incorporation of human logic into the planning process. All parts of the algorithm suit real–time implementation.


Introduction
Recent developments in field robotics and autonomous vehicles have raised the demand for advanced and intelligent navigation systems capable of planning safe and dynamically feasable trajectories while minimizing objectives like path length, energy cost and trip time. In a dynamic environment the relevance of the objectives and their relation to each other can also change rapidly, i.e. if the robot has limited time to reach its goal destination minimizing the total trip time has higher priority over energy consumption until changes in the environ-ment like a sudden blockade on its previously planned path forces the robot to choose a more energy efficient yet more time consuming path.
In the case of unmanned ground vehicles, when passing through rugged terrain the robot can be exposed to risk. Consider a planetary rover with limited battery charge and limited time to reach its destination. The robot has to navigate through a canyon where the solar cells cannot recharge the robot's battery efficiently. At the same time the robot is exposed to the risk of turning over on the rocky surface or getting trapped due to falling rocks. The robot's navigation system has to be capable of planning an energy efficient but safe path avoiding dangerous areas while incorporating the risk of getting trapped within the canyon. In the case of the Mars rovers such an operation could result in losing the robot.
Robots applied in military operations can also be exposed to significant risk. Consider a military robot that has a rescue mission. It has limited time to reach its destination and also a limited battery charge that it must not consume before it reaches the goal state. Between the robot's current state and the goal destination lies a mine field. The robot's navigation system has to be capable of planning a safe path while continously evaluating the risk of damage when passing through the mine field.
Robots often have to navigate in dynamic and cluttered environments with imperfect information on the terrain. Efficient replanning of the path is needed since a previously planned path can be invalidated by obstacles like falling rocks, or due to imperfect map information when new sensory information received show previously low risk areas to be high risk areas.
Optimizing many objectives at once is challenging. The robot's path planning problem results in minimizing a dynamically changing multi-cost function. Minimizing such a cost function can be time consuming and computionally cumbersome. Also the robot has to per-form this operation frequently when replanning is needed. Instead of finding the minimum of such a cost, function decision theory has to be incorporated into the path planning process.
We propose a modified version of the widely used D* algorithm, where determining the focus of the search is performed using the Analytic Hierarchy Process (AHP) method.
This paper is organized as follows. In the next section the AHP method is discussed fol-lowed by a short overview on incremental planners for robots. After that we introduce the AD-AHP method that incorporates decision making into the incremental planner. After the simulationresults the conclusion ends the study.

The Analytic Hieararchy Process
The Analytic Hierarchy Process introduced in the relevant documents (Saaty, 1980), (Saaty, 1988) is an efficient method for multi-criteria decision making. Given a set of alternatives and a set of criteria AHP selects the best alternative. The AHP method has been used in machine learning for artifitial intelligence applications (Kong, Liu, 2005), also it proved to be useful in military engineering for selecting optimal military equipment (Gyarmati, Kende, Felházi, 2009), (Gyarmati, 2006), (Kavas, 2009). The study (Fallahi, Leung, Chandana, 2009) intro-duces a method where the Fuzzy-AHP was used along with the Ant Colony Optimization path planner for selecting an appropriate UAV for a mission. Another application has shown AHP to be useful in disaster management (Mali, Rao, Mantha, 2013) where finding the best root in an emergency case was carried out by Dijkstra's algorithm on a graph where the arc costs were determined by AHP. The study (Vaidyaa, Kumarb, 2006) gives a general overview on recent AHP applications.
As given in the decision matrix below, the decision system has M alternatives and N cri-teria. The entries of the decision matrix are the performance values of the alternatives within each criterion. To determine the best alternative these entries are weighed by the weighing coefficients that express the importance of a criterion. i i ω KOMLÓSI István: The AD-AHP planner: Incremental path planning for robots incorporating decision theory Analytic Hierarchy Process performs pairwise comparison within each criterion. A judge-ment matrix has to be established for each criterion (not to be confused with the formally mentioned decision matrix) where a =1/a are values from the relative scale of importance according to Saaty's book ij ji (1980). When a a =a is satisfied for all entries in the matrix then the judgement matrix is ik kj ij consistent. Judgement matrices are often inconsistent since human decisions cannot always assure consistent rankings. The W weighing coefficients can be derived from calculating the priority vector according to the judgement matrix for the criteria. AHP works as follows. One has to determine the priority vector that is the right principal eigenvector of the judgement matrix for each C criterion. It has been shown in article (Saaty, 2003) along with other thoughts on the priority vector that it must be the right principal eigenvector of the judgement matrix that satisfies Aω=λ max where λ max is the maximal eigenvalue and ω is the eigenvector. This is especially important in the case of inconsistent matrices. One way to obtain the priority vector is to approximate it by taking the geometric means of each row and then normalizing them by their sum so that the entires of the priority vector can be approximated by Next the consistency index CI has to be estimated so that it can be derived from the formula The consistency ratio CR is to be calculated by dividing the consistency index by the Random Consistency Index that is often given in tables (Saaty, 1980) as a function of n. The consistency ratio refers to the consistency property of the judgement matrix. It has been shown in (Saaty, 1980) that if the consistency ratio is greater than 0.1 it is preferable to rees-tablish the judgement matrix. AHP KOMLÓSI István: The AD-AHP planner: Incremental path planning for robots incorporating decision theory The entries of the decision matrix are the entries of the priority vectors for each criterion. The final priorities can be calculated by taking the scalar product of each row according to the alternatives and the weight vector that is where A i the final priority of the ith alternative.

Incremental planning
Robot path planning in dynamic environments has been in the focus of robotics research for a long time (Latombe, 1991). Many methods have proven to be efficient and useful. The study (LaValle, 2006) gives a general overview on planning algorithms. The popular poten-tial field's method (Hwang, Ahuja, 1992) has proven to be efficient for simple path planning tasks. Since it suffers from the problem of getting trapped in local minima, different versions of this method were developed to overcome this difficulty (Fox, Burgard, Thrun, 1997), also the Probabilistic Road Map method has been widely introduced (Kavraki, Svestka, Latombe, Overmars, 1996). The likewise popular Rapidly Exploring Random Tree (RRT) method in-troduced in (LaValle, 2006) also has many variants (Ferguson, Kalra, Stentz, 2006), (Fergu-son, Stentz, 2006a), some of them are capable of planning in high dimensional search spaces . Soft computing methods like genetic algorithms can be incorporat-ed in path planning, also neural networks and fuzzysystems have been widely used for this task (Li, Yang, Seto, 2009). The (Yanduo, Kun, 2009) study presents a path planning method, using Liquid State Machines, that is a member of the family of recurrent neural networks, and shows promising results (Maass, Natschlaeger, Markram, 2002).
When detailed information on the terrain is available in the form of a map for the robot's navigation system, map based path planners prove to be useful. Among map based planners grid based path planning methods were frequently addressed since incremental planners like A* (Hart, Nilsson, Raphael, 1968) are capable of solving the planning problem. The main idea of grid based path planning is dividing the map into cells where each cell has an associ-ated traversal cost. When a cell is occupied by an obstacle the cell's traversal cost is set to a high value (or to infinity). A graph is built where the graph nodes correspond to the corner or the center of the grid cells while the edge costs correspond to the traversal cost of the cells. The objective of minimizing the overall path cost, which is the sum of the traversal costs of the cells the robot passes through, can be performed by finding the shortest path in the corresponding graph. Incremental planners keep a record of the path cost of every visited node. At the same time a heuristic cost is estimatated from the dedicated node to the goal or to the start state depending on the search direction. Each visited node is inserted into a priority queue called an open list with a key value that is the sum of the path cost and the heuristic cost estimate. When the heuristic does not overestimate the real path cost to the goal the resulting path will be optimal. The open list thus contains the potential nodes that the search can be continued from and the search is focused towards the node with minimal key value on the open list. The heuristic cost estimate focuses the direction of the search and vastly influences the optimality of the solution. Finding efficient heuristics in a multi-cost environment is challenging and is often impossible in high dimensional search spaces.
Every time a node is expanded, i.e. its neighbours are visited, it is taken off the open list. Incremental planners also maintain a list for backpointers. When a previously visited node is revisited by expanding a node from the open list and the node's cost value decreases due to the expansion the state is updated, that is, the cost value is set to the decreased value and the node's backpointer is set to the expanded node. This way the algorithm can keep a record of the best path obtained so far.
The D* algorithm (Stentz, 1994) is the dynamic version of A*. It has been proven to be at least two orders of magnitude more efficient than planning from scratch with A* when chang-es in the arc costs occur. The main idea was to regard only those affected states that could potentially improve or invalidate the previously planned path. A one step lookahead of the path cost is used for this reason to identify inconsistent states where arc costs have changed. When the grid resolution of the map was insufficient but the process had to operate with fixed memory capacity, the Multi-Resolution Field D* (Ferguson, Stentz, 2006b) algorithm was developed, which is now currently used as the primary path planner along with TEMPEST onboard the Mars rovers Spirit and Opportunity (Carsten, Rankin, Ferguson, Stentz, 2007). AARMS (12) 2 (2013) KOMLÓSI István: The AD-AHP planner: Incremental path planning for robots incorporating decision theory In a large state space, as mentioned above, efficient heuristics are needed to focus the search. Another way to deal with the large number of states is to define dominance relations. The DD* algorithm (Korsah, Stentz, Dias, 2007) efficiently prunes the search space by using dominance relations, i.e. when a node is dominated by another by the mean of dominance, the search will be aborted from the dominated node. In a rapidly changing environment non-dominated states can turn dominated and vice-versa.
The most recently introduced member of the D* family is the Anytime Dynamic A* (AD*) planner (Likhachev, Ferguson, Gordon, Stentz, Thrun, 2005), which is capable of planning in rapidly changing environments by continously improving previous solutions through refin-ing its heuristic cost estimate. Since it is an anytime algorithm it runs until elaboration time allows and within the given time frame it produces a path with a suboptimality bound.
We will take the AD* planner as a basis of the AD-AHP algorithm that is introduced in the next section.

The AD-AHP path planner
The AHP improved AD* algorithm works as follows. Instead of determining a single cum-mulative cost function that sums up all costs, i.e. path length, time, energy, risk, etc. with given weights and instead of defining dominance relations a priority queue is maintained for each cost function where nodes are inserted with their corresponding key values when visited.
The key values similarly to the basic operation are the sum of the given cost (i.e. energy cost, time cost) and the heuristic cost estimate. The cost of risk can be expressed as a probability value so the key value of a node on the "risk open list" must be calculated using an appropriate method in probability theory. It is assumed that the ranking of the objectives is done by another process that monitors the robot's vicinity constantly and decides the best ranking of the objectives in any situation. Some neural network approchaes like the Learning Vector Quantization (LVQ) method (Kohonen, 1995) could be used for this reason. First the start state is inserted into each open list and the neighbouring nodes of the start state are visited. For each visited node a cost value, a one step lookahead and a heuristic cost estimate is calculated according to all cost functions. The visited nodes are then inserted into the corresponding priority queues. The next step is to determine which node to select for expansion. Given the actual ranking of the criteria, i.e. the ranking of the different costs and given a finite set of alternatives, i.e. graph nodes with minimun key values in each priority queue by using AHP, the best node to continue the search from can be determined. The AHP procedure has to be performed every time before taking a node off the open lists. It is wise not to take only the nodes with the best key values but to take nodes from a finite horizon, say the four or five best nodes in order to gain a greater set of alternatives. Of course, this could result in extra computational expense.
A quintessential question of the method is when to update a node's cost value if it is re-visited, since an improvement in energy cost could result in an increase in the time cost and vice-versa. The main idea is to update states according to the weight vector of the criteria. Let us denote the actual path length of the current node with d , its energy cost with E , time-cost 0 0 with t , and the cost or probability of risk with r . Assume that after updating the state one gets of each other in a vector. This vector is then multiplied by the weight vector of the criteria that is assumed to be normalized to unity. Given the weighing coefficients w ,w ,w ,w , when is satisfied, the state can be updated, i.e. the cost values are set and the node's backpointer is set to the currently expanded node. It is assumed that all values are nonnegative, and zero entries in the above equation should be discarded and left out of the comparison.
The algorithm operates on as AD*, after reaching the goal state the heuristics are im-proved and the search restarts. It has to be mentioned that if a dramatic change in the envi-ronment or a dramatic reordering of the objectives occurs, it is wise to replan from scratch instead of trying to improve already existing solutions.
The main advantage of this approach is that human logic can be incorporated into the path planning process and thus an arbitrary number and arbitrary kinds of objectives can be handled at once. The robot can often find itself in situations when ranking the objectives by importance requires human experties. If the robot's navigation system is capable of learning, with human aid, it is possible for the robot to build up and contniuosly update a database of the rankings and apply the previously mentioned LVQ or some other method to determine the proper ranking when it finds itself in unfamiliar situations.
The formal description of the AD-AHP can be found in the Appendix. The main steps are of AD* (Likhachev, Ferguson, Gordon, Stentz, Thrun, 2005), the relevant modifications are where the AHP was incorporated into the algorithm.

Simulation Results
Simulations were carried out with four objectives, for cell traversal costs the cost values according to the different objectives were set randomly. Given was a start state, a goal state and three obstacles. The robot was regarded a point robot. Fig. 3. -Fig. 6. show the result of the path planning when each objective was regarded independently. On each picture the cells with a darker shade have higher cost value. On Fig. 7. and Fig. 8. the results can be seen when the planner had to regard all objectives at once first with weight vector and then with weight vector . KOMLÓSI István: The AD-AHP planner: Incremental path planning for robots incorporating decision theory

Conclusions
We have introduced a novel incremental planner that is capable of planning collision free trajectories for robots in a dynamic multi-cost environment while incorporating decision the-ory. Instead of using a single cost function and state dominance, Analytic Hierarchy Process was used along with AD* to determine the focus of the search. The advantage of the algo-rithm is that very different objectives can be handled at the same time with the incorporation of human logic. All parts of the algorithm suit real-time implementation.