E. Hyytiä: Markov decision process (MDP)

Two sample applications of MDP and FPI

Markov decision processes (MDPs) and policy iteration are powerful tools to solve dynamic decision problems. Here we give two application examples:

1. Dynamic RWA problem in Wavelength Routed Optical Networks

In Routing and Wavelength Assignment (RWA) problem, in the absence of wavelength converters, the task is to find such a route and a wavelength for each end-to-end lightpath so that no link has two lightpaths sharing the same wavelength.
In dynamic RWA problems, lightpath requests arrive according to (Poisson) process
Currently established lightpaths correspond to the state of the system
Decision on how customer/request is handled triggers a state change in the system
Constitutes a MDP, where the objective is to minimize the blocking probability
Enormous state space defies the computation of value functions, and thus also the policy iteration
First policy iteration can be carried out by estimating the relative values by on-the-fly simulations [1-3]

Related publications:

[1]	E. Hyytiä and J. Virtamo, Dynamic Routing and Wavelength Assignment Using First Policy Iteration, in the Fifth IEEE Symposium on Computers and Communications, ISCC'2000, pp. 146-151, 2000, Antibes, Juan les Pins, France.
[2]	E. Hyytiä and J. Virtamo, Dynamic Routing and Wavelength Assignment Using First Policy Iteration, Inhomogeneous Traffic Case, in the International Conference on Performance and QoS of Next Generation Networking, P&QNet2000, pp. 301-316, 2000, Nagoya, Japan.
[3]	E. Hyytiä and J. Virtamo, Adaptive Importance Sampling in Routing and Wavelength Assignment, European Transactions on Telecommunications (ETT), vol. 13, no. 4, pp. 331-339, 2002, Special Issue on Rare Event Simulation.

Figure: A sample RWA problem with 8 lightpaths.

References:

Richard Bellman, Dynamic programming, Princeton University Press, 1957.
Sheldon M. Ross, Applied Probability Models with Optimization Applications, Holden-Day Inc., 1970.
Ronald A. Howard, Dynamic Probabilistic Systems, Volume II: Semi-Markov and Decision Processes, Wiley Interscience, 1971.

2. Dispatching (Routing) Problem to Parallel Servers

Arriving customers are processed by a set of parallel servers
Customer is assigned a server upon arrival (dispatching decision)
Constitutes a MDP problem, where the objective is to minimize:
- the mean delay (latency, sojourn time)
- the slowdown (the ratio of delay to job size)
- some other weighted sum of the waiting and/or sojourn time

Some optimality results exist for the basic setting, where all customers are flexible and subject to a dispatching decision. For identical servers with number in the queues is available, exponentially distributed service times and Poisson arrival process, Winston has shown that JSQ minimizes the mean delay. The optimality of Round-robin has been shown by Ephremides et al. in case when the current state of the queues is not available. FPI based approaches offer a general methodology to devise state-dependent policies, see, e.g., Krishan.
More details in the dispatching page.

Figure: A sample dispatching system.

References:

W. Winston, Optimality of the shortest line discipline, Journal of Applied Probability, 1977.
A. Ephremides, P. Varaiya and J. Walrand, A simple dynamic routing problem, IEEE Trans. on Automatic Control, 1980.
K. R. Krishnan, Joining the right queue: a state-dependent decision rule, IEEE Trans. on Automatic Control, 1990.

Esa Hyytiä, D.Sc. (Tech.)

Two sample applications of MDP and FPI

1. Dynamic RWA problem in Wavelength Routed Optical Networks

References:

2. Dispatching (Routing) Problem to Parallel Servers

References: