更改
Move 2 urls. Wayback Medic 2.5
In the mathematical theory of probability, the '''drift-plus-penalty method''' is used for optimization of [[queueing network]]s and other [[stochastic system]]s.
The technique is for stabilizing a queueing network while also minimizing the time average of a network penalty function. It can be used to optimize performance objectives such as time average power, throughput, and throughput utility.
<ref name = neely-energy-it>
M. J. Neely, "[http://www-bcf.usc.edu/~mjneely/pdf_papers/neely-energy-it.pdf Energy Optimal Control for Time Varying Wireless Networks]," IEEE Transactions on Information Theory, vol. 52, no. 7, pp. 2915–2934, July 2006.
</ref>
<ref name=neely-fairness-infocom05>
M. J. Neely, E. Modiano, and C. Li, "[https://www.researchgate.net/profile/Chih_Ping_Li/publication/221242398_Fairness_and_Optimal_Stochastic_Control_for_Heterogeneous_Networks/links/0a85e532265750bfc3000000.pdf Fairness and Optimal Stochastic Control for Heterogeneous Networks]," Proc. IEEE INFOCOM, March 2005.
</ref>
In the special case when there is no penalty to be minimized, and when the goal is to design a stable routing policy in a multi-hop network, the method reduces to [[backpressure routing]].
<ref name=tass-radio-nets>L. Tassiulas and A. Ephremides,
"Stability Properties of Constrained Queueing Systems and
Scheduling Policies for Maximum Throughput in Multihop
Radio Networks, ''IEEE Transactions on Automatic Control'', vol. 37, no. 12, pp. 1936–1948, Dec. 1992.
</ref>
<ref name=now>
L. Georgiadis, M. J. Neely, and L. Tassiulas, "[https://www.nowpublishers.com/article/DownloadSummary/NET-001 Resource Allocation and Cross-Layer Control in Wireless Networks],"
''Foundations and Trends in Networking'', vol. 1, no. 1, pp. 1–149, 2006.
</ref>
The drift-plus-penalty method can also be used to minimize the time average of a [[stochastic process]] subject to time average constraints on a collection of other stochastic processes.
<ref name = sno-text>
M. J. Neely.
''[https://www.morganclaypool.com/doi/abs/10.2200/s00271ed1v01y201006cnt007 Stochastic Network Optimization with Application to Communication and Queueing Systems],''
Morgan & Claypool, 2010.
</ref>
This is done by defining an appropriate set of [[#Virtual queues|virtual queues]]. It can also be used to produce time averaged solutions to [[convex optimization]] problems.
<ref name =neely-dcdis>
M. J. Neely, "[Distributed and Secure Computation of Convex Programs over a Network of Connected Processors Distributed and Secure Computation of Convex Programs over a Network of Connected Processors],"
DCDIS Conf, Guelph, Ontario, July 2005
</ref>
<ref name=sucha-qoi-arxiv>
S. Supittayapornpong and M. J. Neely, "[https://arxiv.org/abs/1211.6162 Quality of Information Maximization for Wireless Networks via a Fully Separable Quadratic Policy]," arXiv:1211.6162v2, Nov. 2012.
</ref>
==Methodology==
The drift-plus-penalty method applies to queueing systems that operate in discrete time with time slots ''t'' in {0, 1, 2, ...}. First, a non-negative function ''L''(''t'') is defined as a scalar measure of the state of all queues at time ''t''. The function ''L''(''t'') is typically defined as the sum of the squares of all queue sizes at time ''t'', and is called a [[Lyapunov optimization|Lyapunov function]]. The ''Lyapunov drift'' is defined:
: <math>
\Delta L(t) = L(t+1) - L(t)
</math>
Every slot t, the current queue state is observed and control actions are taken to greedily minimize a bound on the following ''drift-plus-penalty expression'':
: <math>\Delta L(t) + Vp(t),
</math>
where ''p''(''t'') is the penalty function and V is a non-negative weight. The V parameter can be chosen to ensure the time average of ''p''(''t'') is arbitrarily close to optimal, with a corresponding tradeoff in average queue size. Like [[backpressure routing]], this method typically does not require knowledge of the probability distributions for job arrivals and network mobility.<ref name = sno-text/>
==Origins and applications==
When <math>V=0,</math> the method reduces to greedily minimizing the Lyapunov drift. This results in the [[backpressure routing]] algorithm originally developed by Tassiulas and Ephremides (also called the ''max-weight algorithm'').<ref name=tass-radio-nets/><ref name = tass-server-allocation>L. Tassiulas and A. Ephremides, "Dynamic Server Allocation to Parallel Queues with Randomly Varying Connectivity," IEEE Transactions on Information Theory, vol. 39, no. 2, pp. 466–478, March 1993.</ref> The <math>Vp(t)</math> term was added to the drift expression by Neely<ref name=neely-thesis>M. J. Neely. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with Time Varying Channels. Ph.D. Dissertation, Massachusetts Institute of Technology, LIDS. November 2003.</ref> and Neely, Modiano, Li<ref name="neely-fairness-infocom05"/> for stabilizing a network while also maximizing a throughput utility function. For this, the penalty <math>p(t)</math> was defined as <math>-1</math> times a reward earned on slot <math>t.</math> This drift-plus-penalty technique was later used to minimize average power<ref name=neely-energy-it/> and optimize other penalty and reward metrics.<ref name = now/><ref name =sno-text/>
The theory was developed primarily for optimizing communication networks, including wireless networks, ad-hoc mobile networks, and other computer networks. However, the mathematical techniques can be applied to optimization and control for other stochastic systems, including renewable energy allocation in [[Smart grid|smart power grids]]<ref name=rahul-energy-storage>R. Urgaonkar, B. Urgaonkar, M. J. Neely, A. Sivasubramaniam, "Optimal Power Cost Management Using Stored Energy in Data Centers," Proc. SIGMETRICS 2011.</ref><ref name=moeller-smartgrid2010>M. Baghaie, S. Moeller, B. Krishnamachari, "[http://www.baghaie.net/powerCon10.pdf Energy Routing on the Future Grid: A Stochastic Network Optimization Approach]," Proc. International Conf. on Power System Technology (POWERCON), Oct. 2010.</ref><ref name=neely-smartgrid>M. J. Neely, A. S. Tehrani, and A. G. Dimakis, "Efficient Algorithms for Renewable Energy Allocation to Delay Tolerant Consumers," 1st IEEE International Conf. on Smart Grid Communications, 2010.</ref> and [[inventory control]] for product assembly systems.<ref name=neely-inventory-control>M. J. Neely and L. Huang, "Dynamic Product Assembly and Inventory Control for Maximum Profit," Proc. IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010.</ref>
==How it works==
This section shows how to use the drift-plus-penalty method to minimize the time average of a function p(t) subject to time average constraints on a collection of other functions. The analysis below is based on the material in.<ref name = sno-text/>
===The stochastic optimization problem===
Consider a discrete time system that evolves over normalized time slots t in {0, 1, 2, ...}. Define p(t) as a function whose time average should be minimized, called a ''penalty function''. Suppose that minimization of the time average of p(t) must be done subject to time-average constraints on a collection of K other functions:
* <math>
p(t) = \text{penalty function whose time average must be minimized}
</math>
* <math>
y_1(t), y_2(t), \ldots, y_K(t) =\text{other functions whose time averages must be non-positive}
</math>
Every slot t, the network controller observes a new random event. It then makes a control action based on knowledge of this event. The values of p(t) and y_i(t) are determined as functions of the random event and the control action on slot t:
* <math>
\omega(t) = \text{random event on slot } t \text{ (assumed i.i.d. over slots)}
</math>
* <math>
\alpha(t) = \text{control action on slot } t \text{ (chosen after observing } \omega(t) \text{)}
</math>
* <math>
p(t) = P(\alpha(t), \omega(t)) \text{ (a deterministic function of } \alpha(t), \omega(t) \text{)}
</math>
* <math>
y_i(t) = Y_i(\alpha(t), \omega(t)) \text{ } \forall i \in \{1, \ldots, K\} \text{ (deterministic functions of } \alpha(t), \omega(t) \text{)}
</math>
The small case notation p(t), y_i(t) and upper case notation P(), Y_i() is used to distinguish the penalty values from the function that determines these values based on the random event and control action for slot t. The random event <math>\omega(t)</math> is assumed to take values in some abstract set of events <math>\Omega</math>. The control action <math>\alpha(t)</math> is assumed to be chosen within some abstract set <math>A</math> that contains the control options. The sets <math>\Omega</math> and <math>A</math> are arbitrary and can be either finite or infinite. For example, <math>A</math> could be a finite list of abstract elements, an uncountably infinite (and possibly non-convex) collection of real-valued vectors, and so on. The functions P(), Y_i() are also arbitrary and do not require continuity or convexity assumptions.
As an example in the context of communication networks, the random event <math>\omega(t)</math> can be a vector that contains the slot t arrival information for each node and the slot t channel state information for each link. The control action <math>\alpha(t)</math> can be a vector that contains the routing and transmission decisions for each node. The functions P() and Y_i() can represent power expenditures or throughputs associated with the control action and channel condition for slot t.
For simplicity of exposition, assume the P() and Y_i() functions are bounded. Further assume the random event process <math>\omega(t)</math> is [[Independent and identically distributed random variables|independent and identically distributed (i.i.d.)]] over slots t with some possibly unknown probability distribution. The goal is to design a policy for making control actions over time to solve the following problem:
: <math>
\text{Minimize: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[p(\tau)]
</math>
: <math>
\text{Subject to: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[y_i(\tau)] \leq 0 \text{ } \forall i \in \{1, \ldots, K\}
</math>
It is assumed throughout that this problem is ''feasible''. That is, it is assumed that there exists an algorithm that can satisfy all of the ''K'' desired constraints.
The above problem poses each constraint in the ''standard form'' of a time average expectation of an abstract process y_i(t) being non-positive. There is no loss of generality with this approach. For example, suppose one desires the time average expectation of some process a(t) to be less than or equal to a given constant c. Then a new penalty function ''y''(''t'') = ''a''(''t'') − ''c'' can be defined, and the desired constraint is equivalent to the time average expectation of ''y''(''t'') being non-positive. Likewise, suppose there are two processes ''a''(''t'') and ''b''(''t'') and one desires the time average expectation of ''a''(''t'') to be less than or equal to that of ''b''(''t''). This constraint is written in standard form by defining a new penalty function ''y''(''t'') = ''a''(''t'') − ''b''(''t''). The above problem seeks to ''minimize'' the time average of an abstract penalty function ''p'(''t'')'. This can be used to ''maximize'' the time average of some desirable ''reward function'' ''r''(''t'') by defining ''p''(''t'') = −''r''('t'').
===Virtual queues===
For each constraint ''i'' in {1, ..., ''K''}, define a ''virtual queue'' with dynamics over slots ''t'' in {0, 1, 2, ...} as follows:
: <math>
(\text{Eq. } 1) \text{ } Q_i(t+1) = \max[Q_i(t) + y_i(t), 0]
</math>
Initialize ''Q''<sub>''i''</sub>(0) = 0 for all ''i'' in {1, ..., ''K''}. This update equation is identical to that of a ''virtual'' discrete time queue with backlog Q_i(t) and with y_i(t) being the difference between new arrivals and new service opportunities on slot ''t''. Intuitively, stabilizing these virtual queues ensures the time averages of the constraint functions are less than or equal to zero, so the desired constraints are satisfied. To see this precisely, note that (Eq. 1) implies:
: <math>
Q_i(t+1) \geq Q_i(t) + y_i(t)
</math>
Therefore:
: <math>
y_i(t) \leq Q_i(t+1) - Q_i(t)
</math>
Summing the above over the first t slots and using the law of telescoping sums implies:
: <math>
\sum_{\tau=0}^{t-1} y_i(\tau) \leq Q_i(t) - Q_i(0) = Q_i(t)
</math>
Dividing by ''t'' and taking expectations implies:
: <math>
\frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)] \leq \frac{E[Q_i(t)]}{t}
</math>
Therefore, the desired constraints of the problem are satisfied whenever the following holds for all i in {1, ..., ''K''}:
: <math>
\lim_{t\rightarrow\infty} \frac{E[Q_i(t)]} t = 0
</math>
A queue Q_i(t) that satisfies the above limit equation is said to be ''mean rate stable''.<ref name=sno-text/>
===The drift-plus-penalty expression===
To stabilize the queues, define the Lyapunov function L(t) as a measure of the total queue backlog on slot ''t'':
: <math>
L(t) = \frac{1}{2}\sum_{i=1}^KQ_i(t)^2
</math>
Squaring the queueing equation (Eq. 1) results in the following bound for each queue i in {1, ..., K}:
: <math>
Q_i(t+1)^2 \leq (Q_i(t) + y_i(t))^2 = Q_i(t)^2 + y_i(t)^2 + 2Q_i(t)y_i(t)
</math>
Therefore,
: <math>
\frac{1}{2}\sum_{i=1}^KQ_i(t+1)^2 \leq \frac{1}{2}\sum_{i=1}^KQ_i(t)^2 + \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^KQ_i(t)y_i(t)
</math>
It follows that
: <math>
\Delta L(t) = L(t+1) - L(t) \leq \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^K Q_i(t)y_i(t)
</math>
Now define B as a positive constant that upper bounds the first term on the right-hand-side of the above inequality. Such a constant exists because the y_i(t) values are bounded. Then:
: <math>
\Delta L(t) \leq B + \sum_{i=1}^K Q_i(t)y_i(t)
</math>
Adding Vp(t) to both sides results in the following bound on the drift-plus-penalty expression:
: <math>
(\text{Eq. } 2) \text{ } \Delta L(t) + Vp(t) \leq B + Vp(t) + \sum_{i=1}^K Q_i(t)y_i(t)
</math>
The drift-plus-penalty algorithm (defined below) makes control actions every slot t that greedily minimize the right-hand-side of the above inequality. Intuitively, taking an action that minimizes the drift alone would be beneficial in terms of queue stability but would not minimize time average penalty. Taking an action that minimizes the penalty alone would not necessarily stabilize the queues. Thus, taking an action to minimize the weighted sum incorporates both objectives of queue stability and penalty minimization. The weight V can be tuned to place more or less emphasis on penalty minimization, which results in a performance tradeoff.<ref name=sno-text/>
===Drift-plus-penalty algorithm===
Let <math>A</math> be the abstract set of all possible control actions. Every slot t, observe the random event and the current queue values:
: <math> \text{Observe: } \omega(t), Q_1(t), \ldots, Q_K(t)
</math>
Given these observations for slot t, greedily choose a control action <math>\alpha(t) \in A</math> to minimize the following expression (breaking ties arbitrarily):
: <math> VP(\alpha(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha(t), \omega(t))
</math>
Then update the queues for each i in {1, ..., K} according to (Eq. 1). Repeat this procedure for slot t+1.<ref name=sno-text/>
Note that the random event and queue backlogs observed on slot ''t'' act as given constants when selecting the control action for the slot t minimization. Thus, each slot involves a deterministic search for the minimizing control action over the set ''A''. A key feature of this algorithm is that it does not require knowledge of the probability distribution of the random event process.
===Approximate scheduling===
The above algorithm involves finding a minimum of a function over an abstract set ''A''. In general cases, the minimum might not exist, or might be difficult to find. Thus, it is useful to assume the algorithm is implemented in an approximate manner as follows: Define ''C'' as a non-negative constant, and assume that for all slots ''t'', the control action <math>\alpha(t)</math> is chosen in the set ''A'' to satisfy:
: <math>
\begin{align}
& VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\
\leq {} & C + \inf_{\alpha\in A} [VP(\alpha, \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha,\omega(t))]
\end{align}
</math>
Such a control action is called a ''C-additive approximation''.<ref name=sno-text/> The case ''C'' = 0 corresponds to exact minimization of the desired expression on every slot ''t''.
==Performance analysis==
This section shows the algorithm results in a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size.<ref name =sno-text/>
===Average penalty analysis===
Define an ''<math>\omega</math>-only policy'' to be a stationary and randomized policy for choosing the control action <math>\alpha(t)</math> based on the observed <math>\omega(t)</math> only. That is, an <math>\omega</math>-only policy specifies, for each possible random event <math>\omega \in \Omega</math>, a conditional probability distribution for selecting a control action <math>\alpha(t) \in A</math> given that <math>\omega(t)=\omega</math>. Such a policy makes decisions independent of current queue backlog. Assume there exists an <math>\omega</math>-only policy <math>\alpha^*(t)</math> that satisfies the following:
:<math> (\text{Eq. } 3) \qquad E[P(\alpha^*(t), \omega(t))] = p^* = \text{optimal time average penalty for the problem} </math>
:<math> (\text{Eq. } 4) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leqslant 0 \qquad \forall i \in \{1, \ldots, K\} </math>
The expectations above are with respect to the random variable <math>\omega(t)</math> for slot <math>t,</math> and the random control action <math>\alpha(t)</math> chosen on slot <math>t</math> after observing <math>\omega(t)</math>. Such a policy <math>\alpha^*(t)</math> can be shown to exist whenever the desired control problem is feasible and the event space for <math>\omega(t)</math> and action space for <math>\alpha(t)</math> are finite, or when mild closure properties are satisfied.<ref name = sno-text/>
Let <math>\alpha(t)</math> represent the action taken by a C-additive approximation of the drift-plus-penalty algorithm of the previous section, for some non-negative constant C. To simplify terminology, we call this action the ''drift-plus-penalty action'', rather than the ''C-additive approximate drift-plus-penalty action''. Let <math>\alpha^*(t)</math> represent the <math>\omega</math>-only decision:
:<math> \alpha(t) = \text{drift-plus-penalty action for slot } t </math>
:<math> \alpha^*(t) = \omega\text{-only action that satisfies (Eq.3)-(Eq.4)} </math>
Assume the drift-plus-penalty action <math>\alpha(t)</math> is used on each and every slot. By (Eq. 2), the drift-plus-penalty expression under this <math>\alpha(t)</math> action satisfies the following for each slot <math>t:</math>
:<math> \begin{align}
\Delta L(t) + Vp(t) &\leqslant B + Vp(t) + \sum_{i=1}^KQ_i(t)y_i(t) \\
&= B + VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\
&\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t))
\end{align}</math>
where the last inequality follows because action <math>\alpha(t)</math> comes within an additive constant <math>C</math> of minimizing the preceding expression over all other actions in the set <math>A,</math> including <math>\alpha^*(t).</math> Taking expectations of the above inequality gives:
:<math>\begin{align}
E[\Delta(t) + Vp(t)] &\leqslant B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^K E \left [Q_i(t)Y_i(\alpha^*(t), \omega(t)) \right ] \\
& = B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^KE[Q_i(t)]E[Y_i(\alpha^*(t), \omega(t))] && \alpha^*(t), \omega(t) \text{ are independent of } Q_i(t) \\
&\leqslant B + C + Vp^* && \text{Using Eq. 3 and Eq. 4}
\end{align}</math>
Notice that the <math>\alpha^*(t)</math> action was never actually implemented. Its existence was used only for comparison purposes to reach the final inequality. Summing the above inequality over the first <math>t > 0</math> slots gives:
:<math>\begin{align}
(B+C+Vp^*)t &\geqslant \sum_{\tau=0}^{t-1} E[\Delta(\tau) + Vp(\tau)] \\
&= E[L(t)] - E[L(0)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \Delta(\tau) = L(\tau+1)-L(\tau) \\
&= E[L(t)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \text{assume } L(0) = 0 \\
&\geqslant V\sum_{\tau=0}^{t-1}E[p(\tau)] && L(t) \geqslant 0
\end{align} </math>
Dividing the above by <math>Vt</math> yields the following result, which holds for all slots <math>t > 0:</math>
:<math> \frac{1}{t}\sum_{\tau=0}^{t-1} E[p(\tau)] \leqslant p^* + \frac{B+C}{V}.</math>
Thus, the time average expected penalty can be made arbitrarily close to the optimal value <math>p^*</math> by choosing <math>V</math> suitably large. It can be shown that all virtual queues are mean rate stable, and so all desired constraints are satisfied.<ref name=sno-text/> The parameter <math>V</math> affects the size of the queues, which determines the speed at which the time average constraint functions converge to a non-positive number. A more detailed analysis on the size of the queues is given in the next subsection.
===Average queue size analysis===
Assume now there exists an <math>\omega</math>-only policy <math>\alpha^*(t)</math>, possibly different from the one that satisfies (Eq. 3)-(Eq.4), that satisfies the following for some <math>\epsilon>0</math>:
:<math> (\text{Eq. } 5) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leq -\epsilon \qquad \forall i \in \{1, \ldots , K\} </math>
An argument similar to the one in the previous section shows:
:<math>\begin{align}
\Delta(t) + Vp(t) &\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) \\
\Delta(t) + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) && \text{assume } p_{\min} \leqslant P \leqslant p_{\max} \\
E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^K E \left [Q_i(t)] E[Y_i(\alpha^*(t), \omega(t)) \right ] && \text{taking expectation} \\
E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KE[Q_i(t)](-\epsilon) && \text{Using (Eq. 5)} \\
E[\Delta(t)] + \epsilon \sum_{i=1}^KE[Q_i(t)] &\leqslant B + C + V(p_{\max} - p_{\min})
\end{align}</math>
A telescoping series argument similar to the one in the previous section can thus be used to show the following for all t>0:<ref name=sno-text/>
:<math> \frac{1}{t}\sum_{\tau=0}^{t-1} \sum_{i=1}^KE[Q_i(\tau)] \leqslant \frac{B + C + V(p_{\max} - p_{\min})}{\epsilon} </math>
This shows that average queue size is indeed O(V).
===Probability 1 convergence===
The above analysis considers time average expectations. Related probability 1 performance bounds for infinite horizon time average queue size and penalty can be derived using the drift-plus-penalty method together with [[Martingale (probability theory)|martingale theory]].<ref name=lyap-opt-jam>M. J. Neely, "Queue Stability and Probability 1 Convergence via Lyapunov Optimization," Journal of Applied Mathematics, vol. 2012, {{doi|10.1155/2012/831909}}.</ref>
===Application to queues with finite capacity===
As shown, the drift-plus-penalty allows to keep the average queue size under a certain threshold, which depends on the choice of the parameter V, but in general, does not offer any guarantee on the maximum queue occupancy.
However, if the action set respects certain constraints, it is possible to add an additional condition on the choice of V to enforce the maximum length of a queue and thus to apply the algorithm also to queues with finite capacity.<ref name=lyap-opt-fin>L. Bracciale, P. Loreti "Lyapunov drift-plus-penalty optimization for queues with finite capacity" IEEE Communications Letters, {{doi|10.1109/LCOMM.2020.3013125}}.</ref>
===Treatment of queueing systems===
The above analysis considers constrained optimization of time averages in a stochastic system that did not have any explicit queues. Each time average inequality constraint was mapped to a virtual queue according to (Eq. 1). In the case of optimizing a queueing network, the virtual queue equations in (Eq. 1) are replaced by the actual queueing equations.
===Convex functions of time averages===
A related problem is the minimization of a convex function of time averages subject to constraints, such as:
:<math>\text{Minimize} \quad f \left (\overline{y}_1, \ldots, \overline{y}_K \right) \quad \text{subject to} \quad g_i \left (\overline{y}_1, \ldots, \overline{y}_K \right ) \leqslant 0 \qquad \forall i \in \{1, \ldots, N\} </math>
where <math>f</math> and <math>g_i</math> are [[convex function]]s, and where the time averages are defined:
:<math>\overline{y}_i = \lim_{t\to\infty} \frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)]</math>
Such problems of optimizing convex functions of time averages can be transformed into problems of optimizing time averages of functions via ''auxiliary variables'' (see Chapter 5 of the Neely textbook).<ref name=neely-fairness-infocom05/><ref name = sno-text/> The latter problems can then be solved by the drift-plus-penalty method as described in previous subsections. An alternative ''primal-dual'' method makes decisions similar to drift-plus-penalty decisions, but uses a penalty defined by partial derivatives of the objective function <math>f.</math><ref name = sno-text/><ref name=stolyar-greedy>A. Stolyar,"[https://www.researchgate.net/profile/Alexander_Stolyar/publication/220084431_Maximizing_Queueing_Network_Utility_Subject_to_Stability_Greedy_Primal-Dual_Algorithm/links/0deec5179442f5b4fb000000.pdf Maximizing Queueing Network Utility subject to Stability: Greedy Primal-Dual Algorithm]," ''Queueing Systems'', vol. 50, no. 4, pp. 401–457, 2005.</ref><ref name=stolyar-gpd>A. Stolyar, "[https://www.researchgate.net/profile/Alexander_Stolyar/publication/220084487_Greedy_primal-dual_algorithm_for_dynamic_resource_allocation_in_complex_networks/links/0deec5179442e05128000000/Greedy-primal-dual-algorithm-for-dynamic-resource-allocation-in-complex-networks.pdf Greedy Primal-Dual Algorithm for Dynamic Resource Allocation in Complex Networks]," Queueing Systems, vol. 54, no. 3, pp. 203–220, 2006.</ref> The primal-dual approach can also be used to find local optima in cases when <math>f</math> is non-convex.<ref name = sno-text/>
==Delay tradeoffs and related work==
The mathematical analysis in the previous section shows that the drift-plus-penalty method produces a time average penalty that is within O(1/''V'') of optimality, with a corresponding ''O''(''V'') tradeoff in average queue size. This method, together with the ''O''(1/''V''), ''O''(''V'') tradeoff, was developed in Neely<ref name =neely-thesis/> and Neely, Modiano, Li
<ref name=neely-fairness-infocom05/> in the context of maximizing network utility subject to stability.
A related algorithm for maximizing network utility was developed by Eryilmaz and Srikant.
<ref name=atilla-fairness>
A. Eryilmaz and R. Srikant, "Fair Resource Allocation in Wireless Networks using Queue-Length-Based Scheduling
and Congestion Control," Proc. IEEE INFOCOM, March 2005.
</ref>
The Eryilmaz and Srikant work resulted in an algorithm very similar to the drift-plus-penalty algorithm, but used a different analytical technique. That technique was based on [[Lagrange multipliers]]. A direct use of the Lagrange multiplier technique results in a worse tradeoff of ''O''(1/''V''), O(''V''<sup>2</sup>). However, the Lagrange multiplier analysis was later strengthened by Huang and Neely to recover the original ''O''(1/''V''), ''O''(''V'') tradeoffs, while showing that queue sizes are tightly clustered around the Lagrange multiplier of a corresponding deterministic optimization problem.
<ref name = longbo-lagrange>
L. Huang and M. J. Neely, "[https://arxiv.org/abs/0904.3795 Delay Reduction via Lagrange Multipliers in Stochastic Network Optimization]," IEEE Trans. on Automatic Control, vol. 56, no. 4, pp. 842–857, April 2011.
</ref>
This clustering result can be used to modify the drift-plus-penalty algorithm to enable improved ''O''(1/''V''), ''O''(log<sup>2</sup>(''V'')) tradeoffs. The modifications can use either ''place-holder backlog''<ref name=longbo-lagrange/> or [[LIFO (computing)|Last-in-First-Out (LIFO)]] scheduling.
<ref name=moeller-lifo>
S. Moeller, A. Sridharan, B. Krishnamachari, and O. Gnawali, "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.352.2972&rep=rep1&type=pdf Routing without Routes: The Backpressure Collection Protocol]," Proc. IPSN 2010.
</ref>
<ref name = longbo-lifo>
L. Huang, S. Moeller, M. J. Neely, and B. Krishnamachari, "[https://arxiv.org/abs/1008.4895 LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff]," IEEE/ACM Transactions on Networking, to appear.
</ref>
When implemented for non-stochastic functions, the drift-plus-penalty method is similar to the [[Subgradient method|dual subgradient method]] of [[Convex optimization|convex optimization theory]], with the exception that its output is a time average of [[#Drift-plus-penalty_algorithm for convex programming|primal variables]], rather than the primal variables themselves.<ref name=now/><ref name=neely-dcdis/> A related ''primal-dual technique'' for maximizing utility in a stochastic queueing network was developed by Stolyar using a fluid model analysis.
<ref name=stolyar-greedy/>
<ref name=stolyar-gpd/>
The Stolyar analysis does not provide analytical results for a performance tradeoff between utility and queue size. A later analysis of the primal-dual method for stochastic networks proves a similar O(1/V), O(V) utility and queue size tradeoff, and also shows local optimality results for minimizing non-convex functions of time averages, under an additional convergence assumption.<ref name = sno-text/> However, this analysis does not specify how much time is required for the time averages to converge to something close to their infinite horizon limits. Related primal-dual algorithms for utility maximization without queues were developed by Agrawal and Subramanian
<ref name= agrawal-allerton02>
R. Agrawal and V. Subramanian, "[http://www.hamilton.ie/vsubramanian/Allerton2002_final.pdf Optimality of certain channel aware scheduling policies]," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.
</ref>
and Kushner and Whiting.
<ref name= kushner-allerton02>
H. Kushner and P. Whiting, "[https://apps.dtic.mil/dtic/tr/fulltext/u2/a461871.pdf Asymptotic Properties of Proportional-Fair Sharing Algorithms]," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.
</ref>
==Extensions to non-i.i.d. event processes==
The drift-plus-penalty algorithm is known to ensure similar performance guarantees for more general ergodic processes <math>\omega(t)</math>, so that the i.i.d. assumption is not crucial to the analysis. The algorithm can be shown to be robust to non-ergodic changes in the probabilities for <math>\omega(t)</math>. In certain scenarios, it can be shown to provide desirable analytical guarantees, called ''universal scheduling guarantees'', for arbitrary <math>\omega(t)</math> processes.<ref name=sno-text/>
==Extensions to variable frame length systems==
The drift-plus-penalty method can be extended to treat systems that operate over variable size frames.<ref name=restless-bandit-NUM>
C. Li and M. J. Neely, "Network utility maximization over partially observable Markovian channels," Performance Evaluation, https://dx.doi.org/10.1016/j.peva.2012.10.003.
</ref>
<ref name=neely-renewals>
M. J. Neely, "[https://arxiv.org/abs/1011.5942 Dynamic Optimization and Learning for Renewal Systems]," IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 32–46, Jan. 2013.
</ref>
In that case, the frames are labeled with indices ''r'' in {0, 1, 2, ...} and the frame durations are denoted {''T''[0], ''T''[1], ''T''[2], ...}, where ''T''[''r''] is a non-negative real number for each frame ''r''. Let <math>\Delta[r]</math> and <math>p[r]</math> be the drift between frame ''r'' and ''r'' + 1, and the total penalty incurred during frame ''r'', respectively. The extended algorithm takes a control action over each frame r to minimize a bound on the following ratio of conditional expectations:
: <math>
\frac{E[\Delta[r] + Vp[r] \mid Q[r]]}{E[T[r] \mid Q[r]]}
</math>
where ''Q''[''r''] is the vector of queue backlogs at the beginning of frame ''r''. In the special case when all frames are the same size and are normalized to 1 slot length, so that ''T''[''r''] = 1 for all ''r'', the above minimization reduces to the standard drift-plus-penalty technique. This frame-based method can be used for constrained optimization of [[Markov decision process|Markov decision problems (MDPs)]] and for other problems involving systems that experience [[Renewal theory|renewals]].
<ref name = restless-bandit-NUM/>
<ref name=neely-renewals/>
==Application to convex programming==
Let ''x'' = (''x''<sub>1</sub>, ..., ''x''<sub>''N''</sub>) be an ''N''-dimensional vector of real numbers, and define the hyper-rectangle ''A'' by:
: <math>
A = \{(x_1, x_2, \ldots, x_N) \mid x_{\min,i} \leq x_i \leq x_{\max,i} \text{ } \forall i \in \{1, \ldots, N\}\}
</math>
where ''x''<sub>min, ''i''</sub>, ''x''<sub>max, ''i''</sub> are given real numbers that satisfy <math>x_{\min,i} < x_{\max, i}</math> for all ''i''. Let ''P''(''x'') and <math> Y_i(x)</math> for i in {1, ..., ''K''} be continuous and [[convex function]]s of the ''x'' vector over all ''x'' in ''A''. Consider the following [[Convex optimization|convex programming]] problem:
: <math>
(\text{Eq. } 6) \text{ } \text{Minimize: } P(x)
</math>
: <math>
(\text{Eq. } 7) \text{ } \text{Subject to: } Y_i(x) \leq 0 \text{ } \forall i \in \{1, \ldots, K\} \text{ }, \text{ } x = (x_1, \ldots, x_N) \in A
</math>
This can be solved by the drift-plus-penalty method as follows: Consider the special case of a deterministic system with no random event process <math>\omega(t)</math>. Define the control action <math>\alpha(t)</math> as:
: <math>
\alpha(t) = x(t) = (x_1(t), x_2(t), \ldots, x_N(t))
</math>
and define the action space as the ''N''-dimensional hyper-rectangle ''A''. Define penalty and constraint functions as:
* <math>
p(t) = P(x_1(t), \ldots, x_N(t))
</math>
* <math>
y_i(t) = Y_i(x_1(t), \ldots, x_N(t)) \text{ } \forall i \in \{1, \ldots, K\}
</math>
Define the following time averages:
* <math>
\overline{x}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1} (x_1(\tau), \ldots, x_N(\tau))
</math>
* <math>
\overline{P}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}P(x_1(\tau), \ldots, x_N(\tau))
</math>
* <math>
\overline{Y}_i(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}Y_i(x_1(\tau), \ldots, x_N(\tau))
</math>
Now consider the following time average optimization problem:
: <math>
(\text{Eq. } 8) \text{ } \text{Minimize: } \lim_{t\rightarrow\infty} \overline{P}(t)
</math>
: <math>
(\text{Eq. } 9) \text{ } \text{subject to: } \lim_{t\rightarrow\infty} \overline{Y}_i(t) \leq 0 \text{ } \forall i \in \{1, \ldots, K\}
</math>
By [[Jensen's inequality]] the following holds for all slots t>0:
: <math>
P(\overline{x}(t)) \leq \overline{P}(t) \text{ } , \text{ } Y_i(\overline{x}(t)) \leq \overline{Y}_i(t) \text{ } \forall i \in \{1, \ldots, K\}
</math>
From this, it can be shown that an optimal solution to the time-average problem (Eq. 8)–(Eq. 9) can be achieved by solutions of the type x(t) = x* for all slots t, where x* is a vector that solves the convex program (Eq. 6)–(Eq. 7). Further, any time-averaged vector <math>\lim_{t\rightarrow\infty}\overline{x}(t)</math> corresponding to a solution of the time-average problem (Eq. 8)–(Eq. 9) must solve the convex program (Eq. 6)–(Eq. 7). Therefore, the original convex program (Eq. 6)–(Eq. 7) can be solved (to within any desired accuracy) by taking the time average of the decisions made when the drift-plus-penalty algorithm is applied to the corresponding time-averaged problem (Eq. 8)–(Eq. 9). The drift-plus-penalty algorithm for problem (Eq. 8)–(Eq. 9) reduces to the following:
===Drift-plus-penalty algorithm for convex programming===
Every slot ''t'', choose vector <math>x(t)=(x_1(t), \ldots, x_N(t)) \in A</math> to minimize the expression:
: <math>
VP(x(t)) + \sum_{i=1}^K Q_i(t)Y_i(x(t))
</math>
Then update the queues according to:
: <math>
Q_i(t+1) = \max[Q_i(t) + Y_i(x(t)), 0] \text{ } \forall i \in \{1, \ldots, K\}
</math>
The time average vector <math>\overline{x}(t)</math> converges to an O(1/V) approximation to the convex program.<ref name=neely-dcdis/>
This algorithm is similar to the standard ''dual subgradient algorithm'' of optimization theory, using a fixed stepsize of 1/V.
<ref name =bertsekas-convex>
D. P. Bertsekas and A. Nedic and A. E. Ozdaglar. ''Convex Analysis and Optimization'', Boston: Athena Scientific, 2003.
</ref>
However, a key difference is that the dual subgradient algorithm is typically analyzed under restrictive strict convexity assumptions that are needed for the ''primal variables'' ''x''(''t'') to converge. There are many important cases where these variables do not converge to the optimal solution, and never even get near the optimal solution (this is the case for most [[Linear programming|linear programs]], as shown below). On the other hand, the drift-plus-penalty algorithm does not require strict convexity assumptions. It ensures that the ''time averages'' of the primals converge to a solution that is within ''O''(1/''V'') of optimality, with ''O''(''V'') bounds on queue sizes (it can be shown that this translates into an ''O''(''V''<sup>2</sup>) bound on convergence time).<ref name=neely-dcdis/>
===Drift-plus-penalty algorithm for linear programming===
Consider the special case of a [[Linear programming|linear program]]. Specifically, suppose:
* <math>
P(x(t)) = \sum_{n=1}^Nc_nx_n(t)
</math>
* <math>
Y_i(x(t)) = \sum_{n=1}^Na_{in} x_n(t) - b_i \text{ } \forall i \in \{1, \ldots, K\}
</math>
for given real-valued constants (''c''<sub>1</sub>, …, ''c''<sub>''N''</sub>), (''a''<sub>''in''</sub>), (''b''<sub>1</sub>, …, ''b''<sub>''K''</sub>). Then the above algorithm reduces to the following: Every slot ''t'' and for each variable ''n'' in {1, …, ''N''}, choose ''x''<sub>''n''</sub>(''t'') in [''x''<sub>min,''n''</sub>, ''x''<sub>max,''n''</sub>] to minimize the expression:
: <math>
\left[ Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \right] x_n(t)
</math>
Then update queues ''Q''<sub>''i''</sub>(''t'') as before. This amounts to choosing each variable ''x''<sub>''i''</sub>(''t'') according to the simple ''bang-bang'' control policy:
: <math>
\text{Choose } x_i(t) = x_{\min,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \geq 0
</math>
: <math>
\text{Choose } x_i(t) = x_{\max,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t) a_{in} < 0
</math>
Since the primal variables ''x''<sub>''i''</sub>(''t'') are always either ''x''<sub>min,''i''</sub> or ''x''<sub>max,''i''</sub>, they can never converge to the optimal solution if the optimal solution is not a vertex point of the hyper-rectangle ''A''. However, the ''time-averages'' of these bang-bang decisions indeed converge to an ''O''(1/''V'') approximation of the optimal solution. For example, suppose that ''x''<sub>min,1</sub> = 0, ''x''<sub>max,1</sub> = 1, and suppose that all optimal solutions to the linear program have ''x''<sub>1</sub> = 3/4. Then roughly 3/4 of the time the bang-bang decision for the first variable will be ''x''<sub>1</sub>(''t'') = 1, and the remaining time it will be ''x''<sub>1</sub>(''t'') = 0.<ref name=sucha-qoi-arxiv/>
==Related links==
* [[Backpressure routing]]
* [[Lyapunov optimization]]
* [[Convex optimization]]
* [[Linear programming]]
==References==
{{Reflist}}
==Primary sources==
*M. J. Neely. ''Stochastic Network Optimization with Application to Communication and Queueing Systems'', Morgan & Claypool, 2010.
[[Category:Networking algorithms]]
[[Category:Queueing theory]]
[[Category:Scheduling algorithms]]
The technique is for stabilizing a queueing network while also minimizing the time average of a network penalty function. It can be used to optimize performance objectives such as time average power, throughput, and throughput utility.
<ref name = neely-energy-it>
M. J. Neely, "[http://www-bcf.usc.edu/~mjneely/pdf_papers/neely-energy-it.pdf Energy Optimal Control for Time Varying Wireless Networks]," IEEE Transactions on Information Theory, vol. 52, no. 7, pp. 2915–2934, July 2006.
</ref>
<ref name=neely-fairness-infocom05>
M. J. Neely, E. Modiano, and C. Li, "[https://www.researchgate.net/profile/Chih_Ping_Li/publication/221242398_Fairness_and_Optimal_Stochastic_Control_for_Heterogeneous_Networks/links/0a85e532265750bfc3000000.pdf Fairness and Optimal Stochastic Control for Heterogeneous Networks]," Proc. IEEE INFOCOM, March 2005.
</ref>
In the special case when there is no penalty to be minimized, and when the goal is to design a stable routing policy in a multi-hop network, the method reduces to [[backpressure routing]].
<ref name=tass-radio-nets>L. Tassiulas and A. Ephremides,
"Stability Properties of Constrained Queueing Systems and
Scheduling Policies for Maximum Throughput in Multihop
Radio Networks, ''IEEE Transactions on Automatic Control'', vol. 37, no. 12, pp. 1936–1948, Dec. 1992.
</ref>
<ref name=now>
L. Georgiadis, M. J. Neely, and L. Tassiulas, "[https://www.nowpublishers.com/article/DownloadSummary/NET-001 Resource Allocation and Cross-Layer Control in Wireless Networks],"
''Foundations and Trends in Networking'', vol. 1, no. 1, pp. 1–149, 2006.
</ref>
The drift-plus-penalty method can also be used to minimize the time average of a [[stochastic process]] subject to time average constraints on a collection of other stochastic processes.
<ref name = sno-text>
M. J. Neely.
''[https://www.morganclaypool.com/doi/abs/10.2200/s00271ed1v01y201006cnt007 Stochastic Network Optimization with Application to Communication and Queueing Systems],''
Morgan & Claypool, 2010.
</ref>
This is done by defining an appropriate set of [[#Virtual queues|virtual queues]]. It can also be used to produce time averaged solutions to [[convex optimization]] problems.
<ref name =neely-dcdis>
M. J. Neely, "[Distributed and Secure Computation of Convex Programs over a Network of Connected Processors Distributed and Secure Computation of Convex Programs over a Network of Connected Processors],"
DCDIS Conf, Guelph, Ontario, July 2005
</ref>
<ref name=sucha-qoi-arxiv>
S. Supittayapornpong and M. J. Neely, "[https://arxiv.org/abs/1211.6162 Quality of Information Maximization for Wireless Networks via a Fully Separable Quadratic Policy]," arXiv:1211.6162v2, Nov. 2012.
</ref>
==Methodology==
The drift-plus-penalty method applies to queueing systems that operate in discrete time with time slots ''t'' in {0, 1, 2, ...}. First, a non-negative function ''L''(''t'') is defined as a scalar measure of the state of all queues at time ''t''. The function ''L''(''t'') is typically defined as the sum of the squares of all queue sizes at time ''t'', and is called a [[Lyapunov optimization|Lyapunov function]]. The ''Lyapunov drift'' is defined:
: <math>
\Delta L(t) = L(t+1) - L(t)
</math>
Every slot t, the current queue state is observed and control actions are taken to greedily minimize a bound on the following ''drift-plus-penalty expression'':
: <math>\Delta L(t) + Vp(t),
</math>
where ''p''(''t'') is the penalty function and V is a non-negative weight. The V parameter can be chosen to ensure the time average of ''p''(''t'') is arbitrarily close to optimal, with a corresponding tradeoff in average queue size. Like [[backpressure routing]], this method typically does not require knowledge of the probability distributions for job arrivals and network mobility.<ref name = sno-text/>
==Origins and applications==
When <math>V=0,</math> the method reduces to greedily minimizing the Lyapunov drift. This results in the [[backpressure routing]] algorithm originally developed by Tassiulas and Ephremides (also called the ''max-weight algorithm'').<ref name=tass-radio-nets/><ref name = tass-server-allocation>L. Tassiulas and A. Ephremides, "Dynamic Server Allocation to Parallel Queues with Randomly Varying Connectivity," IEEE Transactions on Information Theory, vol. 39, no. 2, pp. 466–478, March 1993.</ref> The <math>Vp(t)</math> term was added to the drift expression by Neely<ref name=neely-thesis>M. J. Neely. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with Time Varying Channels. Ph.D. Dissertation, Massachusetts Institute of Technology, LIDS. November 2003.</ref> and Neely, Modiano, Li<ref name="neely-fairness-infocom05"/> for stabilizing a network while also maximizing a throughput utility function. For this, the penalty <math>p(t)</math> was defined as <math>-1</math> times a reward earned on slot <math>t.</math> This drift-plus-penalty technique was later used to minimize average power<ref name=neely-energy-it/> and optimize other penalty and reward metrics.<ref name = now/><ref name =sno-text/>
The theory was developed primarily for optimizing communication networks, including wireless networks, ad-hoc mobile networks, and other computer networks. However, the mathematical techniques can be applied to optimization and control for other stochastic systems, including renewable energy allocation in [[Smart grid|smart power grids]]<ref name=rahul-energy-storage>R. Urgaonkar, B. Urgaonkar, M. J. Neely, A. Sivasubramaniam, "Optimal Power Cost Management Using Stored Energy in Data Centers," Proc. SIGMETRICS 2011.</ref><ref name=moeller-smartgrid2010>M. Baghaie, S. Moeller, B. Krishnamachari, "[http://www.baghaie.net/powerCon10.pdf Energy Routing on the Future Grid: A Stochastic Network Optimization Approach]," Proc. International Conf. on Power System Technology (POWERCON), Oct. 2010.</ref><ref name=neely-smartgrid>M. J. Neely, A. S. Tehrani, and A. G. Dimakis, "Efficient Algorithms for Renewable Energy Allocation to Delay Tolerant Consumers," 1st IEEE International Conf. on Smart Grid Communications, 2010.</ref> and [[inventory control]] for product assembly systems.<ref name=neely-inventory-control>M. J. Neely and L. Huang, "Dynamic Product Assembly and Inventory Control for Maximum Profit," Proc. IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010.</ref>
==How it works==
This section shows how to use the drift-plus-penalty method to minimize the time average of a function p(t) subject to time average constraints on a collection of other functions. The analysis below is based on the material in.<ref name = sno-text/>
===The stochastic optimization problem===
Consider a discrete time system that evolves over normalized time slots t in {0, 1, 2, ...}. Define p(t) as a function whose time average should be minimized, called a ''penalty function''. Suppose that minimization of the time average of p(t) must be done subject to time-average constraints on a collection of K other functions:
* <math>
p(t) = \text{penalty function whose time average must be minimized}
</math>
* <math>
y_1(t), y_2(t), \ldots, y_K(t) =\text{other functions whose time averages must be non-positive}
</math>
Every slot t, the network controller observes a new random event. It then makes a control action based on knowledge of this event. The values of p(t) and y_i(t) are determined as functions of the random event and the control action on slot t:
* <math>
\omega(t) = \text{random event on slot } t \text{ (assumed i.i.d. over slots)}
</math>
* <math>
\alpha(t) = \text{control action on slot } t \text{ (chosen after observing } \omega(t) \text{)}
</math>
* <math>
p(t) = P(\alpha(t), \omega(t)) \text{ (a deterministic function of } \alpha(t), \omega(t) \text{)}
</math>
* <math>
y_i(t) = Y_i(\alpha(t), \omega(t)) \text{ } \forall i \in \{1, \ldots, K\} \text{ (deterministic functions of } \alpha(t), \omega(t) \text{)}
</math>
The small case notation p(t), y_i(t) and upper case notation P(), Y_i() is used to distinguish the penalty values from the function that determines these values based on the random event and control action for slot t. The random event <math>\omega(t)</math> is assumed to take values in some abstract set of events <math>\Omega</math>. The control action <math>\alpha(t)</math> is assumed to be chosen within some abstract set <math>A</math> that contains the control options. The sets <math>\Omega</math> and <math>A</math> are arbitrary and can be either finite or infinite. For example, <math>A</math> could be a finite list of abstract elements, an uncountably infinite (and possibly non-convex) collection of real-valued vectors, and so on. The functions P(), Y_i() are also arbitrary and do not require continuity or convexity assumptions.
As an example in the context of communication networks, the random event <math>\omega(t)</math> can be a vector that contains the slot t arrival information for each node and the slot t channel state information for each link. The control action <math>\alpha(t)</math> can be a vector that contains the routing and transmission decisions for each node. The functions P() and Y_i() can represent power expenditures or throughputs associated with the control action and channel condition for slot t.
For simplicity of exposition, assume the P() and Y_i() functions are bounded. Further assume the random event process <math>\omega(t)</math> is [[Independent and identically distributed random variables|independent and identically distributed (i.i.d.)]] over slots t with some possibly unknown probability distribution. The goal is to design a policy for making control actions over time to solve the following problem:
: <math>
\text{Minimize: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[p(\tau)]
</math>
: <math>
\text{Subject to: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[y_i(\tau)] \leq 0 \text{ } \forall i \in \{1, \ldots, K\}
</math>
It is assumed throughout that this problem is ''feasible''. That is, it is assumed that there exists an algorithm that can satisfy all of the ''K'' desired constraints.
The above problem poses each constraint in the ''standard form'' of a time average expectation of an abstract process y_i(t) being non-positive. There is no loss of generality with this approach. For example, suppose one desires the time average expectation of some process a(t) to be less than or equal to a given constant c. Then a new penalty function ''y''(''t'') = ''a''(''t'') − ''c'' can be defined, and the desired constraint is equivalent to the time average expectation of ''y''(''t'') being non-positive. Likewise, suppose there are two processes ''a''(''t'') and ''b''(''t'') and one desires the time average expectation of ''a''(''t'') to be less than or equal to that of ''b''(''t''). This constraint is written in standard form by defining a new penalty function ''y''(''t'') = ''a''(''t'') − ''b''(''t''). The above problem seeks to ''minimize'' the time average of an abstract penalty function ''p'(''t'')'. This can be used to ''maximize'' the time average of some desirable ''reward function'' ''r''(''t'') by defining ''p''(''t'') = −''r''('t'').
===Virtual queues===
For each constraint ''i'' in {1, ..., ''K''}, define a ''virtual queue'' with dynamics over slots ''t'' in {0, 1, 2, ...} as follows:
: <math>
(\text{Eq. } 1) \text{ } Q_i(t+1) = \max[Q_i(t) + y_i(t), 0]
</math>
Initialize ''Q''<sub>''i''</sub>(0) = 0 for all ''i'' in {1, ..., ''K''}. This update equation is identical to that of a ''virtual'' discrete time queue with backlog Q_i(t) and with y_i(t) being the difference between new arrivals and new service opportunities on slot ''t''. Intuitively, stabilizing these virtual queues ensures the time averages of the constraint functions are less than or equal to zero, so the desired constraints are satisfied. To see this precisely, note that (Eq. 1) implies:
: <math>
Q_i(t+1) \geq Q_i(t) + y_i(t)
</math>
Therefore:
: <math>
y_i(t) \leq Q_i(t+1) - Q_i(t)
</math>
Summing the above over the first t slots and using the law of telescoping sums implies:
: <math>
\sum_{\tau=0}^{t-1} y_i(\tau) \leq Q_i(t) - Q_i(0) = Q_i(t)
</math>
Dividing by ''t'' and taking expectations implies:
: <math>
\frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)] \leq \frac{E[Q_i(t)]}{t}
</math>
Therefore, the desired constraints of the problem are satisfied whenever the following holds for all i in {1, ..., ''K''}:
: <math>
\lim_{t\rightarrow\infty} \frac{E[Q_i(t)]} t = 0
</math>
A queue Q_i(t) that satisfies the above limit equation is said to be ''mean rate stable''.<ref name=sno-text/>
===The drift-plus-penalty expression===
To stabilize the queues, define the Lyapunov function L(t) as a measure of the total queue backlog on slot ''t'':
: <math>
L(t) = \frac{1}{2}\sum_{i=1}^KQ_i(t)^2
</math>
Squaring the queueing equation (Eq. 1) results in the following bound for each queue i in {1, ..., K}:
: <math>
Q_i(t+1)^2 \leq (Q_i(t) + y_i(t))^2 = Q_i(t)^2 + y_i(t)^2 + 2Q_i(t)y_i(t)
</math>
Therefore,
: <math>
\frac{1}{2}\sum_{i=1}^KQ_i(t+1)^2 \leq \frac{1}{2}\sum_{i=1}^KQ_i(t)^2 + \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^KQ_i(t)y_i(t)
</math>
It follows that
: <math>
\Delta L(t) = L(t+1) - L(t) \leq \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^K Q_i(t)y_i(t)
</math>
Now define B as a positive constant that upper bounds the first term on the right-hand-side of the above inequality. Such a constant exists because the y_i(t) values are bounded. Then:
: <math>
\Delta L(t) \leq B + \sum_{i=1}^K Q_i(t)y_i(t)
</math>
Adding Vp(t) to both sides results in the following bound on the drift-plus-penalty expression:
: <math>
(\text{Eq. } 2) \text{ } \Delta L(t) + Vp(t) \leq B + Vp(t) + \sum_{i=1}^K Q_i(t)y_i(t)
</math>
The drift-plus-penalty algorithm (defined below) makes control actions every slot t that greedily minimize the right-hand-side of the above inequality. Intuitively, taking an action that minimizes the drift alone would be beneficial in terms of queue stability but would not minimize time average penalty. Taking an action that minimizes the penalty alone would not necessarily stabilize the queues. Thus, taking an action to minimize the weighted sum incorporates both objectives of queue stability and penalty minimization. The weight V can be tuned to place more or less emphasis on penalty minimization, which results in a performance tradeoff.<ref name=sno-text/>
===Drift-plus-penalty algorithm===
Let <math>A</math> be the abstract set of all possible control actions. Every slot t, observe the random event and the current queue values:
: <math> \text{Observe: } \omega(t), Q_1(t), \ldots, Q_K(t)
</math>
Given these observations for slot t, greedily choose a control action <math>\alpha(t) \in A</math> to minimize the following expression (breaking ties arbitrarily):
: <math> VP(\alpha(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha(t), \omega(t))
</math>
Then update the queues for each i in {1, ..., K} according to (Eq. 1). Repeat this procedure for slot t+1.<ref name=sno-text/>
Note that the random event and queue backlogs observed on slot ''t'' act as given constants when selecting the control action for the slot t minimization. Thus, each slot involves a deterministic search for the minimizing control action over the set ''A''. A key feature of this algorithm is that it does not require knowledge of the probability distribution of the random event process.
===Approximate scheduling===
The above algorithm involves finding a minimum of a function over an abstract set ''A''. In general cases, the minimum might not exist, or might be difficult to find. Thus, it is useful to assume the algorithm is implemented in an approximate manner as follows: Define ''C'' as a non-negative constant, and assume that for all slots ''t'', the control action <math>\alpha(t)</math> is chosen in the set ''A'' to satisfy:
: <math>
\begin{align}
& VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\
\leq {} & C + \inf_{\alpha\in A} [VP(\alpha, \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha,\omega(t))]
\end{align}
</math>
Such a control action is called a ''C-additive approximation''.<ref name=sno-text/> The case ''C'' = 0 corresponds to exact minimization of the desired expression on every slot ''t''.
==Performance analysis==
This section shows the algorithm results in a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size.<ref name =sno-text/>
===Average penalty analysis===
Define an ''<math>\omega</math>-only policy'' to be a stationary and randomized policy for choosing the control action <math>\alpha(t)</math> based on the observed <math>\omega(t)</math> only. That is, an <math>\omega</math>-only policy specifies, for each possible random event <math>\omega \in \Omega</math>, a conditional probability distribution for selecting a control action <math>\alpha(t) \in A</math> given that <math>\omega(t)=\omega</math>. Such a policy makes decisions independent of current queue backlog. Assume there exists an <math>\omega</math>-only policy <math>\alpha^*(t)</math> that satisfies the following:
:<math> (\text{Eq. } 3) \qquad E[P(\alpha^*(t), \omega(t))] = p^* = \text{optimal time average penalty for the problem} </math>
:<math> (\text{Eq. } 4) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leqslant 0 \qquad \forall i \in \{1, \ldots, K\} </math>
The expectations above are with respect to the random variable <math>\omega(t)</math> for slot <math>t,</math> and the random control action <math>\alpha(t)</math> chosen on slot <math>t</math> after observing <math>\omega(t)</math>. Such a policy <math>\alpha^*(t)</math> can be shown to exist whenever the desired control problem is feasible and the event space for <math>\omega(t)</math> and action space for <math>\alpha(t)</math> are finite, or when mild closure properties are satisfied.<ref name = sno-text/>
Let <math>\alpha(t)</math> represent the action taken by a C-additive approximation of the drift-plus-penalty algorithm of the previous section, for some non-negative constant C. To simplify terminology, we call this action the ''drift-plus-penalty action'', rather than the ''C-additive approximate drift-plus-penalty action''. Let <math>\alpha^*(t)</math> represent the <math>\omega</math>-only decision:
:<math> \alpha(t) = \text{drift-plus-penalty action for slot } t </math>
:<math> \alpha^*(t) = \omega\text{-only action that satisfies (Eq.3)-(Eq.4)} </math>
Assume the drift-plus-penalty action <math>\alpha(t)</math> is used on each and every slot. By (Eq. 2), the drift-plus-penalty expression under this <math>\alpha(t)</math> action satisfies the following for each slot <math>t:</math>
:<math> \begin{align}
\Delta L(t) + Vp(t) &\leqslant B + Vp(t) + \sum_{i=1}^KQ_i(t)y_i(t) \\
&= B + VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\
&\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t))
\end{align}</math>
where the last inequality follows because action <math>\alpha(t)</math> comes within an additive constant <math>C</math> of minimizing the preceding expression over all other actions in the set <math>A,</math> including <math>\alpha^*(t).</math> Taking expectations of the above inequality gives:
:<math>\begin{align}
E[\Delta(t) + Vp(t)] &\leqslant B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^K E \left [Q_i(t)Y_i(\alpha^*(t), \omega(t)) \right ] \\
& = B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^KE[Q_i(t)]E[Y_i(\alpha^*(t), \omega(t))] && \alpha^*(t), \omega(t) \text{ are independent of } Q_i(t) \\
&\leqslant B + C + Vp^* && \text{Using Eq. 3 and Eq. 4}
\end{align}</math>
Notice that the <math>\alpha^*(t)</math> action was never actually implemented. Its existence was used only for comparison purposes to reach the final inequality. Summing the above inequality over the first <math>t > 0</math> slots gives:
:<math>\begin{align}
(B+C+Vp^*)t &\geqslant \sum_{\tau=0}^{t-1} E[\Delta(\tau) + Vp(\tau)] \\
&= E[L(t)] - E[L(0)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \Delta(\tau) = L(\tau+1)-L(\tau) \\
&= E[L(t)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \text{assume } L(0) = 0 \\
&\geqslant V\sum_{\tau=0}^{t-1}E[p(\tau)] && L(t) \geqslant 0
\end{align} </math>
Dividing the above by <math>Vt</math> yields the following result, which holds for all slots <math>t > 0:</math>
:<math> \frac{1}{t}\sum_{\tau=0}^{t-1} E[p(\tau)] \leqslant p^* + \frac{B+C}{V}.</math>
Thus, the time average expected penalty can be made arbitrarily close to the optimal value <math>p^*</math> by choosing <math>V</math> suitably large. It can be shown that all virtual queues are mean rate stable, and so all desired constraints are satisfied.<ref name=sno-text/> The parameter <math>V</math> affects the size of the queues, which determines the speed at which the time average constraint functions converge to a non-positive number. A more detailed analysis on the size of the queues is given in the next subsection.
===Average queue size analysis===
Assume now there exists an <math>\omega</math>-only policy <math>\alpha^*(t)</math>, possibly different from the one that satisfies (Eq. 3)-(Eq.4), that satisfies the following for some <math>\epsilon>0</math>:
:<math> (\text{Eq. } 5) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leq -\epsilon \qquad \forall i \in \{1, \ldots , K\} </math>
An argument similar to the one in the previous section shows:
:<math>\begin{align}
\Delta(t) + Vp(t) &\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) \\
\Delta(t) + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) && \text{assume } p_{\min} \leqslant P \leqslant p_{\max} \\
E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^K E \left [Q_i(t)] E[Y_i(\alpha^*(t), \omega(t)) \right ] && \text{taking expectation} \\
E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KE[Q_i(t)](-\epsilon) && \text{Using (Eq. 5)} \\
E[\Delta(t)] + \epsilon \sum_{i=1}^KE[Q_i(t)] &\leqslant B + C + V(p_{\max} - p_{\min})
\end{align}</math>
A telescoping series argument similar to the one in the previous section can thus be used to show the following for all t>0:<ref name=sno-text/>
:<math> \frac{1}{t}\sum_{\tau=0}^{t-1} \sum_{i=1}^KE[Q_i(\tau)] \leqslant \frac{B + C + V(p_{\max} - p_{\min})}{\epsilon} </math>
This shows that average queue size is indeed O(V).
===Probability 1 convergence===
The above analysis considers time average expectations. Related probability 1 performance bounds for infinite horizon time average queue size and penalty can be derived using the drift-plus-penalty method together with [[Martingale (probability theory)|martingale theory]].<ref name=lyap-opt-jam>M. J. Neely, "Queue Stability and Probability 1 Convergence via Lyapunov Optimization," Journal of Applied Mathematics, vol. 2012, {{doi|10.1155/2012/831909}}.</ref>
===Application to queues with finite capacity===
As shown, the drift-plus-penalty allows to keep the average queue size under a certain threshold, which depends on the choice of the parameter V, but in general, does not offer any guarantee on the maximum queue occupancy.
However, if the action set respects certain constraints, it is possible to add an additional condition on the choice of V to enforce the maximum length of a queue and thus to apply the algorithm also to queues with finite capacity.<ref name=lyap-opt-fin>L. Bracciale, P. Loreti "Lyapunov drift-plus-penalty optimization for queues with finite capacity" IEEE Communications Letters, {{doi|10.1109/LCOMM.2020.3013125}}.</ref>
===Treatment of queueing systems===
The above analysis considers constrained optimization of time averages in a stochastic system that did not have any explicit queues. Each time average inequality constraint was mapped to a virtual queue according to (Eq. 1). In the case of optimizing a queueing network, the virtual queue equations in (Eq. 1) are replaced by the actual queueing equations.
===Convex functions of time averages===
A related problem is the minimization of a convex function of time averages subject to constraints, such as:
:<math>\text{Minimize} \quad f \left (\overline{y}_1, \ldots, \overline{y}_K \right) \quad \text{subject to} \quad g_i \left (\overline{y}_1, \ldots, \overline{y}_K \right ) \leqslant 0 \qquad \forall i \in \{1, \ldots, N\} </math>
where <math>f</math> and <math>g_i</math> are [[convex function]]s, and where the time averages are defined:
:<math>\overline{y}_i = \lim_{t\to\infty} \frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)]</math>
Such problems of optimizing convex functions of time averages can be transformed into problems of optimizing time averages of functions via ''auxiliary variables'' (see Chapter 5 of the Neely textbook).<ref name=neely-fairness-infocom05/><ref name = sno-text/> The latter problems can then be solved by the drift-plus-penalty method as described in previous subsections. An alternative ''primal-dual'' method makes decisions similar to drift-plus-penalty decisions, but uses a penalty defined by partial derivatives of the objective function <math>f.</math><ref name = sno-text/><ref name=stolyar-greedy>A. Stolyar,"[https://www.researchgate.net/profile/Alexander_Stolyar/publication/220084431_Maximizing_Queueing_Network_Utility_Subject_to_Stability_Greedy_Primal-Dual_Algorithm/links/0deec5179442f5b4fb000000.pdf Maximizing Queueing Network Utility subject to Stability: Greedy Primal-Dual Algorithm]," ''Queueing Systems'', vol. 50, no. 4, pp. 401–457, 2005.</ref><ref name=stolyar-gpd>A. Stolyar, "[https://www.researchgate.net/profile/Alexander_Stolyar/publication/220084487_Greedy_primal-dual_algorithm_for_dynamic_resource_allocation_in_complex_networks/links/0deec5179442e05128000000/Greedy-primal-dual-algorithm-for-dynamic-resource-allocation-in-complex-networks.pdf Greedy Primal-Dual Algorithm for Dynamic Resource Allocation in Complex Networks]," Queueing Systems, vol. 54, no. 3, pp. 203–220, 2006.</ref> The primal-dual approach can also be used to find local optima in cases when <math>f</math> is non-convex.<ref name = sno-text/>
==Delay tradeoffs and related work==
The mathematical analysis in the previous section shows that the drift-plus-penalty method produces a time average penalty that is within O(1/''V'') of optimality, with a corresponding ''O''(''V'') tradeoff in average queue size. This method, together with the ''O''(1/''V''), ''O''(''V'') tradeoff, was developed in Neely<ref name =neely-thesis/> and Neely, Modiano, Li
<ref name=neely-fairness-infocom05/> in the context of maximizing network utility subject to stability.
A related algorithm for maximizing network utility was developed by Eryilmaz and Srikant.
<ref name=atilla-fairness>
A. Eryilmaz and R. Srikant, "Fair Resource Allocation in Wireless Networks using Queue-Length-Based Scheduling
and Congestion Control," Proc. IEEE INFOCOM, March 2005.
</ref>
The Eryilmaz and Srikant work resulted in an algorithm very similar to the drift-plus-penalty algorithm, but used a different analytical technique. That technique was based on [[Lagrange multipliers]]. A direct use of the Lagrange multiplier technique results in a worse tradeoff of ''O''(1/''V''), O(''V''<sup>2</sup>). However, the Lagrange multiplier analysis was later strengthened by Huang and Neely to recover the original ''O''(1/''V''), ''O''(''V'') tradeoffs, while showing that queue sizes are tightly clustered around the Lagrange multiplier of a corresponding deterministic optimization problem.
<ref name = longbo-lagrange>
L. Huang and M. J. Neely, "[https://arxiv.org/abs/0904.3795 Delay Reduction via Lagrange Multipliers in Stochastic Network Optimization]," IEEE Trans. on Automatic Control, vol. 56, no. 4, pp. 842–857, April 2011.
</ref>
This clustering result can be used to modify the drift-plus-penalty algorithm to enable improved ''O''(1/''V''), ''O''(log<sup>2</sup>(''V'')) tradeoffs. The modifications can use either ''place-holder backlog''<ref name=longbo-lagrange/> or [[LIFO (computing)|Last-in-First-Out (LIFO)]] scheduling.
<ref name=moeller-lifo>
S. Moeller, A. Sridharan, B. Krishnamachari, and O. Gnawali, "[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.352.2972&rep=rep1&type=pdf Routing without Routes: The Backpressure Collection Protocol]," Proc. IPSN 2010.
</ref>
<ref name = longbo-lifo>
L. Huang, S. Moeller, M. J. Neely, and B. Krishnamachari, "[https://arxiv.org/abs/1008.4895 LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff]," IEEE/ACM Transactions on Networking, to appear.
</ref>
When implemented for non-stochastic functions, the drift-plus-penalty method is similar to the [[Subgradient method|dual subgradient method]] of [[Convex optimization|convex optimization theory]], with the exception that its output is a time average of [[#Drift-plus-penalty_algorithm for convex programming|primal variables]], rather than the primal variables themselves.<ref name=now/><ref name=neely-dcdis/> A related ''primal-dual technique'' for maximizing utility in a stochastic queueing network was developed by Stolyar using a fluid model analysis.
<ref name=stolyar-greedy/>
<ref name=stolyar-gpd/>
The Stolyar analysis does not provide analytical results for a performance tradeoff between utility and queue size. A later analysis of the primal-dual method for stochastic networks proves a similar O(1/V), O(V) utility and queue size tradeoff, and also shows local optimality results for minimizing non-convex functions of time averages, under an additional convergence assumption.<ref name = sno-text/> However, this analysis does not specify how much time is required for the time averages to converge to something close to their infinite horizon limits. Related primal-dual algorithms for utility maximization without queues were developed by Agrawal and Subramanian
<ref name= agrawal-allerton02>
R. Agrawal and V. Subramanian, "[http://www.hamilton.ie/vsubramanian/Allerton2002_final.pdf Optimality of certain channel aware scheduling policies]," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.
</ref>
and Kushner and Whiting.
<ref name= kushner-allerton02>
H. Kushner and P. Whiting, "[https://apps.dtic.mil/dtic/tr/fulltext/u2/a461871.pdf Asymptotic Properties of Proportional-Fair Sharing Algorithms]," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.
</ref>
==Extensions to non-i.i.d. event processes==
The drift-plus-penalty algorithm is known to ensure similar performance guarantees for more general ergodic processes <math>\omega(t)</math>, so that the i.i.d. assumption is not crucial to the analysis. The algorithm can be shown to be robust to non-ergodic changes in the probabilities for <math>\omega(t)</math>. In certain scenarios, it can be shown to provide desirable analytical guarantees, called ''universal scheduling guarantees'', for arbitrary <math>\omega(t)</math> processes.<ref name=sno-text/>
==Extensions to variable frame length systems==
The drift-plus-penalty method can be extended to treat systems that operate over variable size frames.<ref name=restless-bandit-NUM>
C. Li and M. J. Neely, "Network utility maximization over partially observable Markovian channels," Performance Evaluation, https://dx.doi.org/10.1016/j.peva.2012.10.003.
</ref>
<ref name=neely-renewals>
M. J. Neely, "[https://arxiv.org/abs/1011.5942 Dynamic Optimization and Learning for Renewal Systems]," IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 32–46, Jan. 2013.
</ref>
In that case, the frames are labeled with indices ''r'' in {0, 1, 2, ...} and the frame durations are denoted {''T''[0], ''T''[1], ''T''[2], ...}, where ''T''[''r''] is a non-negative real number for each frame ''r''. Let <math>\Delta[r]</math> and <math>p[r]</math> be the drift between frame ''r'' and ''r'' + 1, and the total penalty incurred during frame ''r'', respectively. The extended algorithm takes a control action over each frame r to minimize a bound on the following ratio of conditional expectations:
: <math>
\frac{E[\Delta[r] + Vp[r] \mid Q[r]]}{E[T[r] \mid Q[r]]}
</math>
where ''Q''[''r''] is the vector of queue backlogs at the beginning of frame ''r''. In the special case when all frames are the same size and are normalized to 1 slot length, so that ''T''[''r''] = 1 for all ''r'', the above minimization reduces to the standard drift-plus-penalty technique. This frame-based method can be used for constrained optimization of [[Markov decision process|Markov decision problems (MDPs)]] and for other problems involving systems that experience [[Renewal theory|renewals]].
<ref name = restless-bandit-NUM/>
<ref name=neely-renewals/>
==Application to convex programming==
Let ''x'' = (''x''<sub>1</sub>, ..., ''x''<sub>''N''</sub>) be an ''N''-dimensional vector of real numbers, and define the hyper-rectangle ''A'' by:
: <math>
A = \{(x_1, x_2, \ldots, x_N) \mid x_{\min,i} \leq x_i \leq x_{\max,i} \text{ } \forall i \in \{1, \ldots, N\}\}
</math>
where ''x''<sub>min, ''i''</sub>, ''x''<sub>max, ''i''</sub> are given real numbers that satisfy <math>x_{\min,i} < x_{\max, i}</math> for all ''i''. Let ''P''(''x'') and <math> Y_i(x)</math> for i in {1, ..., ''K''} be continuous and [[convex function]]s of the ''x'' vector over all ''x'' in ''A''. Consider the following [[Convex optimization|convex programming]] problem:
: <math>
(\text{Eq. } 6) \text{ } \text{Minimize: } P(x)
</math>
: <math>
(\text{Eq. } 7) \text{ } \text{Subject to: } Y_i(x) \leq 0 \text{ } \forall i \in \{1, \ldots, K\} \text{ }, \text{ } x = (x_1, \ldots, x_N) \in A
</math>
This can be solved by the drift-plus-penalty method as follows: Consider the special case of a deterministic system with no random event process <math>\omega(t)</math>. Define the control action <math>\alpha(t)</math> as:
: <math>
\alpha(t) = x(t) = (x_1(t), x_2(t), \ldots, x_N(t))
</math>
and define the action space as the ''N''-dimensional hyper-rectangle ''A''. Define penalty and constraint functions as:
* <math>
p(t) = P(x_1(t), \ldots, x_N(t))
</math>
* <math>
y_i(t) = Y_i(x_1(t), \ldots, x_N(t)) \text{ } \forall i \in \{1, \ldots, K\}
</math>
Define the following time averages:
* <math>
\overline{x}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1} (x_1(\tau), \ldots, x_N(\tau))
</math>
* <math>
\overline{P}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}P(x_1(\tau), \ldots, x_N(\tau))
</math>
* <math>
\overline{Y}_i(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}Y_i(x_1(\tau), \ldots, x_N(\tau))
</math>
Now consider the following time average optimization problem:
: <math>
(\text{Eq. } 8) \text{ } \text{Minimize: } \lim_{t\rightarrow\infty} \overline{P}(t)
</math>
: <math>
(\text{Eq. } 9) \text{ } \text{subject to: } \lim_{t\rightarrow\infty} \overline{Y}_i(t) \leq 0 \text{ } \forall i \in \{1, \ldots, K\}
</math>
By [[Jensen's inequality]] the following holds for all slots t>0:
: <math>
P(\overline{x}(t)) \leq \overline{P}(t) \text{ } , \text{ } Y_i(\overline{x}(t)) \leq \overline{Y}_i(t) \text{ } \forall i \in \{1, \ldots, K\}
</math>
From this, it can be shown that an optimal solution to the time-average problem (Eq. 8)–(Eq. 9) can be achieved by solutions of the type x(t) = x* for all slots t, where x* is a vector that solves the convex program (Eq. 6)–(Eq. 7). Further, any time-averaged vector <math>\lim_{t\rightarrow\infty}\overline{x}(t)</math> corresponding to a solution of the time-average problem (Eq. 8)–(Eq. 9) must solve the convex program (Eq. 6)–(Eq. 7). Therefore, the original convex program (Eq. 6)–(Eq. 7) can be solved (to within any desired accuracy) by taking the time average of the decisions made when the drift-plus-penalty algorithm is applied to the corresponding time-averaged problem (Eq. 8)–(Eq. 9). The drift-plus-penalty algorithm for problem (Eq. 8)–(Eq. 9) reduces to the following:
===Drift-plus-penalty algorithm for convex programming===
Every slot ''t'', choose vector <math>x(t)=(x_1(t), \ldots, x_N(t)) \in A</math> to minimize the expression:
: <math>
VP(x(t)) + \sum_{i=1}^K Q_i(t)Y_i(x(t))
</math>
Then update the queues according to:
: <math>
Q_i(t+1) = \max[Q_i(t) + Y_i(x(t)), 0] \text{ } \forall i \in \{1, \ldots, K\}
</math>
The time average vector <math>\overline{x}(t)</math> converges to an O(1/V) approximation to the convex program.<ref name=neely-dcdis/>
This algorithm is similar to the standard ''dual subgradient algorithm'' of optimization theory, using a fixed stepsize of 1/V.
<ref name =bertsekas-convex>
D. P. Bertsekas and A. Nedic and A. E. Ozdaglar. ''Convex Analysis and Optimization'', Boston: Athena Scientific, 2003.
</ref>
However, a key difference is that the dual subgradient algorithm is typically analyzed under restrictive strict convexity assumptions that are needed for the ''primal variables'' ''x''(''t'') to converge. There are many important cases where these variables do not converge to the optimal solution, and never even get near the optimal solution (this is the case for most [[Linear programming|linear programs]], as shown below). On the other hand, the drift-plus-penalty algorithm does not require strict convexity assumptions. It ensures that the ''time averages'' of the primals converge to a solution that is within ''O''(1/''V'') of optimality, with ''O''(''V'') bounds on queue sizes (it can be shown that this translates into an ''O''(''V''<sup>2</sup>) bound on convergence time).<ref name=neely-dcdis/>
===Drift-plus-penalty algorithm for linear programming===
Consider the special case of a [[Linear programming|linear program]]. Specifically, suppose:
* <math>
P(x(t)) = \sum_{n=1}^Nc_nx_n(t)
</math>
* <math>
Y_i(x(t)) = \sum_{n=1}^Na_{in} x_n(t) - b_i \text{ } \forall i \in \{1, \ldots, K\}
</math>
for given real-valued constants (''c''<sub>1</sub>, …, ''c''<sub>''N''</sub>), (''a''<sub>''in''</sub>), (''b''<sub>1</sub>, …, ''b''<sub>''K''</sub>). Then the above algorithm reduces to the following: Every slot ''t'' and for each variable ''n'' in {1, …, ''N''}, choose ''x''<sub>''n''</sub>(''t'') in [''x''<sub>min,''n''</sub>, ''x''<sub>max,''n''</sub>] to minimize the expression:
: <math>
\left[ Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \right] x_n(t)
</math>
Then update queues ''Q''<sub>''i''</sub>(''t'') as before. This amounts to choosing each variable ''x''<sub>''i''</sub>(''t'') according to the simple ''bang-bang'' control policy:
: <math>
\text{Choose } x_i(t) = x_{\min,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \geq 0
</math>
: <math>
\text{Choose } x_i(t) = x_{\max,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t) a_{in} < 0
</math>
Since the primal variables ''x''<sub>''i''</sub>(''t'') are always either ''x''<sub>min,''i''</sub> or ''x''<sub>max,''i''</sub>, they can never converge to the optimal solution if the optimal solution is not a vertex point of the hyper-rectangle ''A''. However, the ''time-averages'' of these bang-bang decisions indeed converge to an ''O''(1/''V'') approximation of the optimal solution. For example, suppose that ''x''<sub>min,1</sub> = 0, ''x''<sub>max,1</sub> = 1, and suppose that all optimal solutions to the linear program have ''x''<sub>1</sub> = 3/4. Then roughly 3/4 of the time the bang-bang decision for the first variable will be ''x''<sub>1</sub>(''t'') = 1, and the remaining time it will be ''x''<sub>1</sub>(''t'') = 0.<ref name=sucha-qoi-arxiv/>
==Related links==
* [[Backpressure routing]]
* [[Lyapunov optimization]]
* [[Convex optimization]]
* [[Linear programming]]
==References==
{{Reflist}}
==Primary sources==
*M. J. Neely. ''Stochastic Network Optimization with Application to Communication and Queueing Systems'', Morgan & Claypool, 2010.
[[Category:Networking algorithms]]
[[Category:Queueing theory]]
[[Category:Scheduling algorithms]]