漂移加惩罚

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索

此词条暂由彩云小译翻译,翻译字数共5068,未经人工整理和审校,带来阅读不便,请见谅。

In the mathematical theory of probability, the drift-plus-penalty method is used for optimization of queueing networks and other stochastic systems.

In the mathematical theory of probability, the drift-plus-penalty method is used for optimization of queueing networks and other stochastic systems.

在概率数学理论中,漂移加罚法被用于排队网络和其他随机系统的优化。

The technique is for stabilizing a queueing network while also minimizing the time average of a network penalty function. It can be used to optimize performance objectives such as time average power, throughput, and throughput utility. [1] [2] In the special case when there is no penalty to be minimized, and when the goal is to design a stable routing policy in a multi-hop network, the method reduces to backpressure routing. [3] [4] The drift-plus-penalty method can also be used to minimize the time average of a stochastic process subject to time average constraints on a collection of other stochastic processes. [5] This is done by defining an appropriate set of virtual queues. It can also be used to produce time averaged solutions to convex optimization problems. [6] [7]

The technique is for stabilizing a queueing network while also minimizing the time average of a network penalty function. It can be used to optimize performance objectives such as time average power, throughput, and throughput utility.

M. J. Neely, "Energy Optimal Control for Time Varying Wireless Networks," IEEE Transactions on Information Theory, vol. 52, no. 7, pp. 2915–2934, July 2006.


M. J. Neely, E. Modiano, and C. Li, "Fairness and Optimal Stochastic Control for Heterogeneous Networks," Proc. IEEE INFOCOM, March 2005.

In the special case when there is no penalty to be minimized, and when the goal is to design a stable routing policy in a multi-hop network, the method reduces to backpressure routing. L. Tassiulas and A. Ephremides, "Stability Properties of Constrained Queueing Systems and Scheduling Policies for Maximum Throughput in Multihop Radio Networks, IEEE Transactions on Automatic Control, vol. 37, no. 12, pp. 1936–1948, Dec. 1992.


L. Georgiadis, M. J. Neely, and L. Tassiulas, "Resource Allocation and Cross-Layer Control in Wireless Networks," Foundations and Trends in Networking, vol. 1, no. 1, pp. 1–149, 2006.

The drift-plus-penalty method can also be used to minimize the time average of a stochastic process subject to time average constraints on a collection of other stochastic processes.

M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010.

This is done by defining an appropriate set of virtual queues. It can also be used to produce time averaged solutions to convex optimization problems.

M. J. Neely, "[Distributed and Secure Computation of Convex Programs over a Network of Connected Processors Distributed and Secure Computation of Convex Programs over a Network of Connected Processors]," DCDIS Conf, Guelph, Ontario, July 2005


S. Supittayapornpong and M. J. Neely, "Quality of Information Maximization for Wireless Networks via a Fully Separable Quadratic Policy," arXiv:1211.6162v2, Nov. 2012.


该技术用于稳定排队网络,同时使网络惩罚函数的时间平均最小化。它可用于优化性能目标,如时间平均功率、吞吐量和吞吐量效用。尼利,“时变无线网络的能量最优控制”,IEEE 信息论汇编,卷。52号,不。7页。2006年7月,2915-2934。尼利,莫迪亚诺,和 C. Li,“异构网络的公平和最佳统计控制”,《科学》杂志。IEEE INFOCOM,2005年3月。在不需要最小化惩罚的特殊情况下,当目标是在多跳网络中设计一个稳定的路由策略时,该方法简化为反压路由。L. Tassiulas 和 A. Ephremides,“多跳无线网络中约束排队系统的稳定性特性和最大吞吐量调度策略”,IEEE 自动控制学报,vol。37不。12页。1936-1948,1992年12月。无线网络中的资源分配和跨层控制〉 ,《网络的基础与趋势》 ,卷。1号,不。1页。1–149, 2006.漂移加罚金法也可以用来最小化受其他随机过程集合的时间平均约束的随机过程的时间平均值。M · J · 尼利。随机网络优化及其在通信和排队系统中的应用,Morgan & Claypool,2010。这是通过定义一组适当的虚拟队列来完成的。它也可以用来产生平均时间解决凸优化问题。尼利,“连接处理器网络上的凸程序的分布式和安全计算连接处理器网络上的凸程序的分布式和安全计算”,dCDIS Conf,圭尔夫,安大略省,2005年7月,S. Supittayapornpong 和 M。尼利,“通过完全可分离的二次策略实现无线网络的信息质量最大化”,arXiv: 1211.6162 v2,2012年11月。

Methodology

Methodology

= 方法学 =

The drift-plus-penalty method applies to queueing systems that operate in discrete time with time slots t in {0, 1, 2, ...}. First, a non-negative function L(t) is defined as a scalar measure of the state of all queues at time t. The function L(t) is typically defined as the sum of the squares of all queue sizes at time t, and is called a Lyapunov function. The Lyapunov drift is defined:


The drift-plus-penalty method applies to queueing systems that operate in discrete time with time slots t in {0, 1, 2, ...}. First, a non-negative function L(t) is defined as a scalar measure of the state of all queues at time t. The function L(t) is typically defined as the sum of the squares of all queue sizes at time t, and is called a Lyapunov function. The Lyapunov drift is defined:

漂移加罚方法适用于在{0,1,2,... }时隙为 t 的离散时间排队系统。首先,非负函数 L (t)被定义为时间 t 上所有队列状态的标量度量。函数 L (t)通常定义为时间 t 上所有队列大小的平方之和,称为李亚普诺夫函数。李亚普诺夫偏移的定义是:

[math]\displaystyle{ \Delta L(t) = L(t+1) - L(t) }[/math]

\Delta L(t) = L(t+1) - L(t)


\Delta L(t) = L(t+1) - L(t)

Every slot t, the current queue state is observed and control actions are taken to greedily minimize a bound on the following drift-plus-penalty expression:

Every slot t, the current queue state is observed and control actions are taken to greedily minimize a bound on the following drift-plus-penalty expression:

观察每个时隙 t 的当前队列状态,并采取控制措施贪婪地使下列漂移加罚表达式的界最小化:

[math]\displaystyle{ \Delta L(t) + Vp(t), }[/math]
\Delta L(t) + Vp(t),


Delta L (t) + Vp (t),

where p(t) is the penalty function and V is a non-negative weight. The V parameter can be chosen to ensure the time average of p(t) is arbitrarily close to optimal, with a corresponding tradeoff in average queue size. Like backpressure routing, this method typically does not require knowledge of the probability distributions for job arrivals and network mobility.[5]

where p(t) is the penalty function and V is a non-negative weight. The V parameter can be chosen to ensure the time average of p(t) is arbitrarily close to optimal, with a corresponding tradeoff in average queue size. Like backpressure routing, this method typically does not require knowledge of the probability distributions for job arrivals and network mobility.

其中 p (t)是惩罚函数,V 是非负权重。可以选择 V 参数来确保 p (t)的时间平均值任意接近最优,并在平均队列大小上进行相应的折衷。与背压路由一样,这种方法通常不需要知道作业到达和网络移动性的概率分布。

Origins and applications

Origins and applications

= 起源与应用 =

When [math]\displaystyle{ V=0, }[/math] the method reduces to greedily minimizing the Lyapunov drift. This results in the backpressure routing algorithm originally developed by Tassiulas and Ephremides (also called the max-weight algorithm).[3][8] The [math]\displaystyle{ Vp(t) }[/math] term was added to the drift expression by Neely[9] and Neely, Modiano, Li[2] for stabilizing a network while also maximizing a throughput utility function. For this, the penalty [math]\displaystyle{ p(t) }[/math] was defined as [math]\displaystyle{ -1 }[/math] times a reward earned on slot [math]\displaystyle{ t. }[/math] This drift-plus-penalty technique was later used to minimize average power[1] and optimize other penalty and reward metrics.[4][5]

When V=0, the method reduces to greedily minimizing the Lyapunov drift. This results in the backpressure routing algorithm originally developed by Tassiulas and Ephremides (also called the max-weight algorithm).L. Tassiulas and A. Ephremides, "Dynamic Server Allocation to Parallel Queues with Randomly Varying Connectivity," IEEE Transactions on Information Theory, vol. 39, no. 2, pp. 466–478, March 1993. The Vp(t) term was added to the drift expression by NeelyM. J. Neely. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with Time Varying Channels. Ph.D. Dissertation, Massachusetts Institute of Technology, LIDS. November 2003. and Neely, Modiano, Li for stabilizing a network while also maximizing a throughput utility function. For this, the penalty p(t) was defined as -1 times a reward earned on slot t. This drift-plus-penalty technique was later used to minimize average power and optimize other penalty and reward metrics.

当 V = 0时,该方法减少到贪婪地最小化李雅普诺夫漂移。这导致了最初由 Tassiulas 和 Ephremides 开发的背压路由算法(也称为最大权重算法)。Tassiulas 和 A. Ephremides,“具有随机连接性的并行队列的动态服务器分配”,IEEE 信息论汇编,vol。39不。2页。466-478,1993年3月。通过 NeelyM 将 Vp (t)项加入到漂移表达式中。J ・尼利。时变信道下卫星与无线网络的动态功率分配与路由。麻省理工学院博士学位论文。2003年11月。和尼利,莫迪亚诺,李为稳定网络,同时也最大限度地提高吞吐量效用函数。为此,惩罚 p (t)被定义为 -1倍于在插槽 t 上获得的奖励。这种漂移加惩罚技术后来被用于最小化平均功率和优化其他惩罚和奖励指标。

The theory was developed primarily for optimizing communication networks, including wireless networks, ad-hoc mobile networks, and other computer networks. However, the mathematical techniques can be applied to optimization and control for other stochastic systems, including renewable energy allocation in smart power grids[10][11][12] and inventory control for product assembly systems.[13]

The theory was developed primarily for optimizing communication networks, including wireless networks, ad-hoc mobile networks, and other computer networks. However, the mathematical techniques can be applied to optimization and control for other stochastic systems, including renewable energy allocation in smart power gridsR. Urgaonkar, B. Urgaonkar, M. J. Neely, A. Sivasubramaniam, "Optimal Power Cost Management Using Stored Energy in Data Centers," Proc. SIGMETRICS 2011.M. Baghaie, S. Moeller, B. Krishnamachari, "Energy Routing on the Future Grid: A Stochastic Network Optimization Approach," Proc. International Conf. on Power System Technology (POWERCON), Oct. 2010.M. J. Neely, A. S. Tehrani, and A. G. Dimakis, "Efficient Algorithms for Renewable Energy Allocation to Delay Tolerant Consumers," 1st IEEE International Conf. on Smart Grid Communications, 2010. and inventory control for product assembly systems.M. J. Neely and L. Huang, "Dynamic Product Assembly and Inventory Control for Maximum Profit," Proc. IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010.

该理论的发展主要是为了优化通信网络,包括无线网络、自组织移动网络和其他计算机网络。然而,数学技术可以应用于其他随机系统的优化和控制,包括可再生能源在智能电网中的分配。Urgaonkar,M.J. Neely,A. Sivasubramaniam,“使用数据中心存储能源的最优电力成本管理”,Proc。SIGMETRICS 2011. M.未来电网上的能源路由: 一种随机网络优化方法。国际会议。关于电力系统技术(POWERCON) ,2010年10月。可再生能源分配的有效算法以延迟容忍消费者,第一届 IEEE 国际会议。关于智能电网通信,2010年。和产品装配系统的库存控制。利润最大化的动态产品组装与库存控制。IEEE 会议。关于决策与控制,亚特兰大,佐治亚州,2010年12月。

How it works

How it works

= 如何工作 =

This section shows how to use the drift-plus-penalty method to minimize the time average of a function p(t) subject to time average constraints on a collection of other functions. The analysis below is based on the material in.[5]

This section shows how to use the drift-plus-penalty method to minimize the time average of a function p(t) subject to time average constraints on a collection of other functions. The analysis below is based on the material in.

这一节展示了如何使用漂移加惩罚方法来最小化一个函数 p (t)受时间平均约束的时间平均对其他函数的集合。下面的分析是基于。

The stochastic optimization problem

The stochastic optimization problem

= 随机最佳化问题 = =

Consider a discrete time system that evolves over normalized time slots t in {0, 1, 2, ...}. Define p(t) as a function whose time average should be minimized, called a penalty function. Suppose that minimization of the time average of p(t) must be done subject to time-average constraints on a collection of K other functions:

Consider a discrete time system that evolves over normalized time slots t in {0, 1, 2, ...}. Define p(t) as a function whose time average should be minimized, called a penalty function. Suppose that minimization of the time average of p(t) must be done subject to time-average constraints on a collection of K other functions:

考虑一个离散时间系统,它在{0,1,2,... }的标准化时隙 t 上进化。将 p (t)定义为时间平均值应该最小化的函数,称为惩罚函数。假设 p (t)的时间平均最小化必须满足 K 其他函数集合的时间平均约束:

  • [math]\displaystyle{ p(t) = \text{penalty function whose time average must be minimized} }[/math]

p(t) = \text{penalty function whose time average must be minimized}


  • p (t) = text {惩罚函数,其时间平均值必须最小化}
  • [math]\displaystyle{ y_1(t), y_2(t), \ldots, y_K(t) =\text{other functions whose time averages must be non-positive} }[/math]

y_1(t), y_2(t), \ldots, y_K(t) =\text{other functions whose time averages must be non-positive}


  • y _ 1(t) ,y _ 2(t) ,ldot,y _ K (t) = text {其它函数的时间平均值必须为非正}

Every slot t, the network controller observes a new random event. It then makes a control action based on knowledge of this event. The values of p(t) and y_i(t) are determined as functions of the random event and the control action on slot t:

Every slot t, the network controller observes a new random event. It then makes a control action based on knowledge of this event. The values of p(t) and y_i(t) are determined as functions of the random event and the control action on slot t:

每个时隙 t,网络控制器观察到一个新的随机事件。然后,它根据对此事件的了解进行控制操作。P (t)和 y _ i (t)的值被确定为随机事件的函数,以及时隙 t 上的控制作用:

  • [math]\displaystyle{ \omega(t) = \text{random event on slot } t \text{ (assumed i.i.d. over slots)} }[/math]

\omega(t) = \text{random event on slot } t \text{ (assumed i.i.d. over slots)}


  • omega (t) = text {银槽上的随机事件} t text {(假设为 i.i.d。在老虎机上
  • [math]\displaystyle{ \alpha(t) = \text{control action on slot } t \text{ (chosen after observing } \omega(t) \text{)} }[/math]

\alpha(t) = \text{control action on slot } t \text{ (chosen after observing } \omega(t) \text{)}


阿尔法(t) = 文本{对槽的控制动作} t 文本{(在观察} omega (t) text {之后选择)}

  • [math]\displaystyle{ p(t) = P(\alpha(t), \omega(t)) \text{ (a deterministic function of } \alpha(t), \omega(t) \text{)} }[/math]
p(t) = P(\alpha(t), \omega(t)) \text{ (a deterministic function of } \alpha(t), \omega(t) \text{)} 


  • p (t) = P (alpha (t) ,omega (t)) text {(a _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  • [math]\displaystyle{ y_i(t) = Y_i(\alpha(t), \omega(t)) \text{ } \forall i \in \{1, \ldots, K\} \text{ (deterministic functions of } \alpha(t), \omega(t) \text{)} }[/math]
y_i(t) = Y_i(\alpha(t), \omega(t))   \text{ }  \forall i \in \{1, \ldots, K\} \text{ (deterministic functions of }  \alpha(t), \omega(t) \text{)} 


  • y _ i (t) = Y _ i (alpha (t) ,omega (t)) text {}针对{1,ldot,K } text {(确定性函数} alpha (t) ,omega (t) text {)}中的所有 i

The small case notation p(t), y_i(t) and upper case notation P(), Y_i() is used to distinguish the penalty values from the function that determines these values based on the random event and control action for slot t. The random event [math]\displaystyle{ \omega(t) }[/math] is assumed to take values in some abstract set of events [math]\displaystyle{ \Omega }[/math]. The control action [math]\displaystyle{ \alpha(t) }[/math] is assumed to be chosen within some abstract set [math]\displaystyle{ A }[/math] that contains the control options. The sets [math]\displaystyle{ \Omega }[/math] and [math]\displaystyle{ A }[/math] are arbitrary and can be either finite or infinite. For example, [math]\displaystyle{ A }[/math] could be a finite list of abstract elements, an uncountably infinite (and possibly non-convex) collection of real-valued vectors, and so on. The functions P(), Y_i() are also arbitrary and do not require continuity or convexity assumptions.

The small case notation p(t), y_i(t) and upper case notation P(), Y_i() is used to distinguish the penalty values from the function that determines these values based on the random event and control action for slot t. The random event \omega(t) is assumed to take values in some abstract set of events \Omega. The control action \alpha(t) is assumed to be chosen within some abstract set A that contains the control options. The sets \Omega and A are arbitrary and can be either finite or infinite. For example, A could be a finite list of abstract elements, an uncountably infinite (and possibly non-convex) collection of real-valued vectors, and so on. The functions P(), Y_i() are also arbitrary and do not require continuity or convexity assumptions.

小写符号 p (t) ,y _ i (t)和大写符号 P () ,Y _ i ()用于区分惩罚值和函数,函数根据时隙 t 的随机事件和控制操作确定这些值。假设随机事件 ω (t)取某些抽象事件集合中的值。假定在包含控件选项的抽象集 A 中选择控件操作 alpha (t)。集合 Omega 和 A 是任意的,可以是有限的,也可以是无限的。例如,A 可以是抽象元素的有限列表,实值向量的不可数的无限(可能是非凸)集合,等等。函数 P () ,Y _ i ()也是任意的,不需要连续性或凸性假设。

As an example in the context of communication networks, the random event [math]\displaystyle{ \omega(t) }[/math] can be a vector that contains the slot t arrival information for each node and the slot t channel state information for each link. The control action [math]\displaystyle{ \alpha(t) }[/math] can be a vector that contains the routing and transmission decisions for each node. The functions P() and Y_i() can represent power expenditures or throughputs associated with the control action and channel condition for slot t.

As an example in the context of communication networks, the random event \omega(t) can be a vector that contains the slot t arrival information for each node and the slot t channel state information for each link. The control action \alpha(t) can be a vector that contains the routing and transmission decisions for each node. The functions P() and Y_i() can represent power expenditures or throughputs associated with the control action and channel condition for slot t.

作为通信网络环境中的一个例子,随机事件 ω (t)可以是一个包含每个节点的时隙 t 到达信息和每个链路的时隙 t 信道状态信息的矢量。控制操作 alpha (t)可以是一个向量,其中包含每个节点的路由和传输决策。函数 P ()和 Y _ i ()可以表示与时隙 t 的控制动作和通道条件相关的功耗或吞吐量。

For simplicity of exposition, assume the P() and Y_i() functions are bounded. Further assume the random event process [math]\displaystyle{ \omega(t) }[/math] is independent and identically distributed (i.i.d.) over slots t with some possibly unknown probability distribution. The goal is to design a policy for making control actions over time to solve the following problem:

For simplicity of exposition, assume the P() and Y_i() functions are bounded. Further assume the random event process \omega(t) is independent and identically distributed (i.i.d.) over slots t with some possibly unknown probability distribution. The goal is to design a policy for making control actions over time to solve the following problem:

为了简化公式,假设 P ()和 Y _ i ()函数是有界的。进一步假设随机事件过程 omega (t)是独立的,同分布的(i.i.d.)通过 t 插槽可能有未知的概率分布。目标是设计一种策略,随着时间的推移采取控制行动,以解决以下问题:

[math]\displaystyle{ \text{Minimize: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[p(\tau)] }[/math]

\text{Minimize: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[p(\tau)]


text { Minimize: } lim _ { t right right tarrow infty } frac {1}{ t } sum _ { tau = 0} ^ { t-1} E [ p (tau)]
[math]\displaystyle{ \text{Subject to: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[y_i(\tau)] \leq 0 \text{ } \forall i \in \{1, \ldots, K\} }[/math]

\text{Subject to: } \lim_{t\rightarrow\infty} \frac{1}{t}\sum_{\tau=0}^{t-1}E[y_i(\tau)] \leq 0 \text{ } \forall i \in \{1, \ldots, K\}


text { Subject to: } lim _ { t right tarrow infty } frac {1}{ t } sum _ { tau = 0} ^ { t-1} E [ y _ i (tau)] leq 0 text {} for all i in {1,ldot,K }

It is assumed throughout that this problem is feasible. That is, it is assumed that there exists an algorithm that can satisfy all of the K desired constraints.

It is assumed throughout that this problem is feasible. That is, it is assumed that there exists an algorithm that can satisfy all of the K desired constraints.

自始至终都假定这个问题是可行的。也就是说,假设存在一个能够满足所有 K 期望约束的算法。

The above problem poses each constraint in the standard form of a time average expectation of an abstract process y_i(t) being non-positive. There is no loss of generality with this approach. For example, suppose one desires the time average expectation of some process a(t) to be less than or equal to a given constant c. Then a new penalty function y(t) = a(t) − c can be defined, and the desired constraint is equivalent to the time average expectation of y(t) being non-positive. Likewise, suppose there are two processes a(t) and b(t) and one desires the time average expectation of a(t) to be less than or equal to that of b(t). This constraint is written in standard form by defining a new penalty function y(t) = a(t) − b(t). The above problem seeks to minimize the time average of an abstract penalty function p'(t)'. This can be used to maximize the time average of some desirable reward function r(t) by defining p(t) = −r('t).

The above problem poses each constraint in the standard form of a time average expectation of an abstract process y_i(t) being non-positive. There is no loss of generality with this approach. For example, suppose one desires the time average expectation of some process a(t) to be less than or equal to a given constant c. Then a new penalty function y(t) = a(t) − c can be defined, and the desired constraint is equivalent to the time average expectation of y(t) being non-positive. Likewise, suppose there are two processes a(t) and b(t) and one desires the time average expectation of a(t) to be less than or equal to that of b(t). This constraint is written in standard form by defining a new penalty function y(t) = a(t) − b(t). The above problem seeks to minimize the time average of an abstract penalty function p'(t)'. This can be used to maximize the time average of some desirable reward function r(t) by defining p(t) = −r('t).

上述问题以抽象过程 y _ i (t)非正的时间平均期望的标准形式提出了每个约束。这种方法没有失去一般性。例如,假设某个进程的时间平均期望值 a (t)小于或等于给定的常数 c。然后定义一个新的惩罚函数 y (t) = a (t)-c,期望约束等价于 y (t)非正的时间平均期望。同样地,假设有两个进程 a (t)和 b (t) ,其中一个希望 a (t)的时间平均期望值小于或等于 b (t)的时间平均期望值。这个约束是通过定义一个新的惩罚函数 y (t) = a (t)-b (t)以标准形式编写的。上述问题试图使抽象惩罚函数 p’(t)’的时间平均最小化。这可以通过定义 p (t) =-r (‘ t)来最大化期望报酬函数 r (t)的时间平均。

Virtual queues

Virtual queues

= = 虚拟队列 = =

For each constraint i in {1, ..., K}, define a virtual queue with dynamics over slots t in {0, 1, 2, ...} as follows:

For each constraint i in {1, ..., K}, define a virtual queue with dynamics over slots t in {0, 1, 2, ...} as follows:

对于{1,... ,K }中的每个约束 i,使用{0,1,2,... }中的槽 t 上的动力学定义一个虚拟队列,如下所示:

[math]\displaystyle{ (\text{Eq. } 1) \text{ } Q_i(t+1) = \max[Q_i(t) + y_i(t), 0] }[/math]

(\text{Eq. } 1) \text{ } Q_i(t+1) = \max[Q_i(t) + y_i(t), 0]


(\text{Eq. }1)文本{} Q _ i (t + 1) = max [ Q _ i (t) + y _ i (t) ,0]

Initialize Qi(0) = 0 for all i in {1, ..., K}. This update equation is identical to that of a virtual discrete time queue with backlog Q_i(t) and with y_i(t) being the difference between new arrivals and new service opportunities on slot t. Intuitively, stabilizing these virtual queues ensures the time averages of the constraint functions are less than or equal to zero, so the desired constraints are satisfied. To see this precisely, note that (Eq. 1) implies:

Initialize Qi(0) = 0 for all i in {1, ..., K}. This update equation is identical to that of a virtual discrete time queue with backlog Q_i(t) and with y_i(t) being the difference between new arrivals and new service opportunities on slot t. Intuitively, stabilizing these virtual queues ensures the time averages of the constraint functions are less than or equal to zero, so the desired constraints are satisfied. To see this precisely, note that (Eq. 1) implies:

对{1,... ,K }中的所有 i 初始化 Qi (0) = 0。这个更新方程与带有积压 Q _ i (t)的虚拟离散时间队列的更新方程相同,y _ i (t)是时隙 t 上新到达和新服务机会之间的差。直观地说,稳定这些虚拟队列可以保证约束函数的时间平均值小于或等于零,因此满足所需的约束条件。要精确地看到这一点,请注意(Eq。1)暗示:

[math]\displaystyle{ Q_i(t+1) \geq Q_i(t) + y_i(t) }[/math]

Q_i(t+1) \geq Q_i(t) + y_i(t)


Q_i(t+1) \geq Q_i(t) + y_i(t)

Therefore:

Therefore:

因此:

[math]\displaystyle{ y_i(t) \leq Q_i(t+1) - Q_i(t) }[/math]

y_i(t) \leq Q_i(t+1) - Q_i(t)


y_i(t) \leq Q_i(t+1) - Q_i(t)

Summing the above over the first t slots and using the law of telescoping sums implies:

Summing the above over the first t slots and using the law of telescoping sums implies:

将上述总和加在第一个 t 槽上并使用叠加总和定律意味着:

[math]\displaystyle{ \sum_{\tau=0}^{t-1} y_i(\tau) \leq Q_i(t) - Q_i(0) = Q_i(t) }[/math]

\sum_{\tau=0}^{t-1} y_i(\tau) \leq Q_i(t) - Q_i(0) = Q_i(t)


\sum_{\tau=0}^{t-1} y_i(\tau) \leq Q_i(t) - Q_i(0) = Q_i(t)

Dividing by t and taking expectations implies:

Dividing by t and taking expectations implies:

除以 t 和接受期望意味着:

[math]\displaystyle{ \frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)] \leq \frac{E[Q_i(t)]}{t} }[/math]

\frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)] \leq \frac{E[Q_i(t)]}{t}


\frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)] \leq \frac{E[Q_i(t)]}{t}

Therefore, the desired constraints of the problem are satisfied whenever the following holds for all i in {1, ..., K}:

Therefore, the desired constraints of the problem are satisfied whenever the following holds for all i in {1, ..., K}:

因此,只要{1,... ,K }中的所有 i 符合以下条件,问题的期望约束就得到满足:

[math]\displaystyle{ \lim_{t\rightarrow\infty} \frac{E[Q_i(t)]} t = 0 }[/math]

\lim_{t\rightarrow\infty} \frac{E[Q_i(t)]} t = 0


lim _ { t right right tarrow infty } frac { E [ Q _ i (t)]} t = 0

A queue Q_i(t) that satisfies the above limit equation is said to be mean rate stable.[5]

A queue Q_i(t) that satisfies the above limit equation is said to be mean rate stable.

满足上述极限方程的队列 Q _ i (t)称为平均速率稳定的队列。

The drift-plus-penalty expression

The drift-plus-penalty expression

= 漂移加惩罚表达式 =

To stabilize the queues, define the Lyapunov function L(t) as a measure of the total queue backlog on slot t:

To stabilize the queues, define the Lyapunov function L(t) as a measure of the total queue backlog on slot t:

为了稳定队列,定义李亚普诺夫函数 L (t)作为 t 槽上队列积压总量的度量:

[math]\displaystyle{ L(t) = \frac{1}{2}\sum_{i=1}^KQ_i(t)^2 }[/math]

L(t) = \frac{1}{2}\sum_{i=1}^KQ_i(t)^2


L(t) = \frac{1}{2}\sum_{i=1}^KQ_i(t)^2

Squaring the queueing equation (Eq. 1) results in the following bound for each queue i in {1, ..., K}:

Squaring the queueing equation (Eq. 1) results in the following bound for each queue i in {1, ..., K}:

排队方程(Eq。1)对于{1,... ,K }中的每个队列 i,结果如下:

[math]\displaystyle{ Q_i(t+1)^2 \leq (Q_i(t) + y_i(t))^2 = Q_i(t)^2 + y_i(t)^2 + 2Q_i(t)y_i(t) }[/math]

Q_i(t+1)^2 \leq (Q_i(t) + y_i(t))^2 = Q_i(t)^2 + y_i(t)^2 + 2Q_i(t)y_i(t)


Q_i(t+1)^2 \leq (Q_i(t) + y_i(t))^2 = Q_i(t)^2 + y_i(t)^2 + 2Q_i(t)y_i(t)

Therefore,

Therefore,

所以,

[math]\displaystyle{ \frac{1}{2}\sum_{i=1}^KQ_i(t+1)^2 \leq \frac{1}{2}\sum_{i=1}^KQ_i(t)^2 + \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^KQ_i(t)y_i(t) }[/math]

\frac{1}{2}\sum_{i=1}^KQ_i(t+1)^2 \leq \frac{1}{2}\sum_{i=1}^KQ_i(t)^2 + \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^KQ_i(t)y_i(t)


\frac{1}{2}\sum_{i=1}^KQ_i(t+1)^2 \leq \frac{1}{2}\sum_{i=1}^KQ_i(t)^2 + \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^KQ_i(t)y_i(t)

It follows that

It follows that

这就是原因

[math]\displaystyle{ \Delta L(t) = L(t+1) - L(t) \leq \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^K Q_i(t)y_i(t) }[/math]

\Delta L(t) = L(t+1) - L(t) \leq \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^K Q_i(t)y_i(t)


\Delta L(t) = L(t+1) - L(t) \leq \frac{1}{2}\sum_{i=1}^Ky_i(t)^2 + \sum_{i=1}^K Q_i(t)y_i(t)

Now define B as a positive constant that upper bounds the first term on the right-hand-side of the above inequality. Such a constant exists because the y_i(t) values are bounded. Then:

Now define B as a positive constant that upper bounds the first term on the right-hand-side of the above inequality. Such a constant exists because the y_i(t) values are bounded. Then:

现在把 B 定义为一个正常数,它的上界是上述不等式右边的第一个项。这样的常数之所以存在,是因为 y _ i (t)值是有界的。然后:

[math]\displaystyle{ \Delta L(t) \leq B + \sum_{i=1}^K Q_i(t)y_i(t) }[/math]

\Delta L(t) \leq B + \sum_{i=1}^K Q_i(t)y_i(t)


\Delta L(t) \leq B + \sum_{i=1}^K Q_i(t)y_i(t)

Adding Vp(t) to both sides results in the following bound on the drift-plus-penalty expression:

Adding Vp(t) to both sides results in the following bound on the drift-plus-penalty expression:

在两边加上 Vp (t) ,得到漂移加惩罚表达式的下列界:

[math]\displaystyle{ (\text{Eq. } 2) \text{ } \Delta L(t) + Vp(t) \leq B + Vp(t) + \sum_{i=1}^K Q_i(t)y_i(t) }[/math]

(\text{Eq. } 2) \text{ } \Delta L(t) + Vp(t) \leq B + Vp(t) + \sum_{i=1}^K Q_i(t)y_i(t)


(\text{Eq. }2)文本{} Delta L (t) + Vp (t) leq B + Vp (t) + sum _ { i = 1} ^ KQ _ i (t) y _ i (t)

The drift-plus-penalty algorithm (defined below) makes control actions every slot t that greedily minimize the right-hand-side of the above inequality. Intuitively, taking an action that minimizes the drift alone would be beneficial in terms of queue stability but would not minimize time average penalty. Taking an action that minimizes the penalty alone would not necessarily stabilize the queues. Thus, taking an action to minimize the weighted sum incorporates both objectives of queue stability and penalty minimization. The weight V can be tuned to place more or less emphasis on penalty minimization, which results in a performance tradeoff.[5]

The drift-plus-penalty algorithm (defined below) makes control actions every slot t that greedily minimize the right-hand-side of the above inequality. Intuitively, taking an action that minimizes the drift alone would be beneficial in terms of queue stability but would not minimize time average penalty. Taking an action that minimizes the penalty alone would not necessarily stabilize the queues. Thus, taking an action to minimize the weighted sum incorporates both objectives of queue stability and penalty minimization. The weight V can be tuned to place more or less emphasis on penalty minimization, which results in a performance tradeoff.

漂移加惩罚算法(定义如下)使控制动作的每一个槽 t 贪婪地最小化右手边的上述不等式。直观地说,单独采取使漂移最小化的措施在队列稳定性方面是有益的,但不会使时间平均损失最小化。仅仅采取尽量减少惩罚的行动不一定会稳定队列。因此,采取行动使加权和最小化包含了队列稳定性和惩罚最小化两个目标。权重 V 可以调整到或多或少强调惩罚最小化,这会导致性能折衷。

Drift-plus-penalty algorithm

Let [math]\displaystyle{ A }[/math] be the abstract set of all possible control actions. Every slot t, observe the random event and the current queue values:

Let A be the abstract set of all possible control actions. Every slot t, observe the random event and the current queue values:

= = = 漂移加罚算法 = = = = 设 A 是所有可能控制行为的抽象集合。每个时隙 t,观察随机事件和当前队列值:

[math]\displaystyle{ \text{Observe: } \omega(t), Q_1(t), \ldots, Q_K(t) }[/math]
\text{Observe: } \omega(t), Q_1(t), \ldots, Q_K(t)


text { Observer: } omega (t) ,Q _ 1(t) ,ldot,Q _ K (t)

Given these observations for slot t, greedily choose a control action [math]\displaystyle{ \alpha(t) \in A }[/math] to minimize the following expression (breaking ties arbitrarily):

Given these observations for slot t, greedily choose a control action \alpha(t) \in A to minimize the following expression (breaking ties arbitrarily):

考虑到对插槽 t 的这些观察,贪婪地在 A 中选择一个控制动作 alpha (t) ,以使下面的表达式最小化(任意断开关系) :

[math]\displaystyle{ VP(\alpha(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha(t), \omega(t)) }[/math]
VP(\alpha(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha(t), \omega(t))


VP (alpha (t) ,omega (t)) + sum _ { i = 1} ^ KQ _ i (t) Y _ i (alpha (t) ,omega (t))

Then update the queues for each i in {1, ..., K} according to (Eq. 1). Repeat this procedure for slot t+1.[5]

Then update the queues for each i in {1, ..., K} according to (Eq. 1). Repeat this procedure for slot t+1.

然后根据(Eq)更新{1,... ,K }中每个 i 的队列。1).对槽 t + 1重复此过程。

Note that the random event and queue backlogs observed on slot t act as given constants when selecting the control action for the slot t minimization. Thus, each slot involves a deterministic search for the minimizing control action over the set A. A key feature of this algorithm is that it does not require knowledge of the probability distribution of the random event process.

Note that the random event and queue backlogs observed on slot t act as given constants when selecting the control action for the slot t minimization. Thus, each slot involves a deterministic search for the minimizing control action over the set A. A key feature of this algorithm is that it does not require knowledge of the probability distribution of the random event process.

请注意,在选择用于槽 t 最小化的控制操作时,在槽 t 上观察到的随机事件和队列积压充当给定的常数。因此,每个时隙涉及到一个确定性搜索集合 A 的最小化控制动作。这个算法的一个主要特点是它不需要知道随机事件过程的概率分布。

Approximate scheduling

Approximate scheduling

= 近似调度 =

The above algorithm involves finding a minimum of a function over an abstract set A. In general cases, the minimum might not exist, or might be difficult to find. Thus, it is useful to assume the algorithm is implemented in an approximate manner as follows: Define C as a non-negative constant, and assume that for all slots t, the control action [math]\displaystyle{ \alpha(t) }[/math] is chosen in the set A to satisfy:

The above algorithm involves finding a minimum of a function over an abstract set A. In general cases, the minimum might not exist, or might be difficult to find. Thus, it is useful to assume the algorithm is implemented in an approximate manner as follows: Define C as a non-negative constant, and assume that for all slots t, the control action \alpha(t) is chosen in the set A to satisfy:

上述算法涉及到在一个抽象集 A 上寻找一个函数的最小值。在一般情况下,最小值可能不存在,或者可能很难找到。因此,假设算法以如下近似方式实现是有用的: 将 C 定义为一个非负常数,并假设对于所有槽 t,在集合 A 中选择控制作用 α (t)以满足:

[math]\displaystyle{ \begin{align} & VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\ \leq {} & C + \inf_{\alpha\in A} [VP(\alpha, \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha,\omega(t))] \end{align} }[/math]

\begin{align} & VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\ \leq {} & C + \inf_{\alpha\in A} [VP(\alpha, \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha,\omega(t))] \end{align}


\begin{align} & VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\ \leq {} & C + \inf_{\alpha\in A} [VP(\alpha, \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha,\omega(t))] \end{align}

Such a control action is called a C-additive approximation.[5] The case C = 0 corresponds to exact minimization of the desired expression on every slot t.

Such a control action is called a C-additive approximation. The case C = 0 corresponds to exact minimization of the desired expression on every slot t.

这种控制作用称为 C- 加性近似。情况 C = 0对应于每个时隙 t 上所需表达式的精确最小化。

Performance analysis

This section shows the algorithm results in a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size.[5]

This section shows the algorithm results in a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size.

= = 性能分析 = = 本节显示算法的时间平均代价在最优性的 O (1/V)范围内,平均队列大小相应的 O (V)折衷。

Average penalty analysis

Define an [math]\displaystyle{ \omega }[/math]-only policy to be a stationary and randomized policy for choosing the control action [math]\displaystyle{ \alpha(t) }[/math] based on the observed [math]\displaystyle{ \omega(t) }[/math] only. That is, an [math]\displaystyle{ \omega }[/math]-only policy specifies, for each possible random event [math]\displaystyle{ \omega \in \Omega }[/math], a conditional probability distribution for selecting a control action [math]\displaystyle{ \alpha(t) \in A }[/math] given that [math]\displaystyle{ \omega(t)=\omega }[/math]. Such a policy makes decisions independent of current queue backlog. Assume there exists an [math]\displaystyle{ \omega }[/math]-only policy [math]\displaystyle{ \alpha^*(t) }[/math] that satisfies the following:

Define an \omega-only policy to be a stationary and randomized policy for choosing the control action \alpha(t) based on the observed \omega(t) only. That is, an \omega-only policy specifies, for each possible random event \omega \in \Omega, a conditional probability distribution for selecting a control action \alpha(t) \in A given that \omega(t)=\omega. Such a policy makes decisions independent of current queue backlog. Assume there exists an \omega-only policy \alpha^*(t) that satisfies the following:

平均惩罚分析 = = = = 定义一个只有 ω 的策略是一个平稳的和随机的策略来选择控制行动 α (t)仅基于观察到的 ω (t)。也就是说,只有 Omega 的策略规定,对于 Omega 中的每一个可能的随机事件,在给定 Omega (t) = Omega 的情况下,选择 a 中的控制作用 alpha (t)的条件概率分布。这种策略使决策独立于当前队列积压。假设存在一个满足以下条件的仅 Ω 策略 alpha ^ * (t) :

[math]\displaystyle{ (\text{Eq. } 3) \qquad E[P(\alpha^*(t), \omega(t))] = p^* = \text{optimal time average penalty for the problem} }[/math]
[math]\displaystyle{ (\text{Eq. } 4) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leqslant 0 \qquad \forall i \in \{1, \ldots, K\} }[/math]
(\text{Eq. } 3) \qquad E[P(\alpha^*(t), \omega(t))] = p^* = \text{optimal time average penalty for the problem}
(\text{Eq. } 4) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leqslant 0 \qquad \forall i \in \{1, \ldots, K\}
(text { Eq. }3) qquad E [ P (alpha ^
  • (t) ,omega (t))] = p ^
  • = text {优化时间平均惩罚} : (text { Eq. }4) qquad E [ Y _ i (alpha ^
  • (t) ,omega (t))]等式0 qquad 对于{1,ldot,K }中的所有 i

The expectations above are with respect to the random variable [math]\displaystyle{ \omega(t) }[/math] for slot [math]\displaystyle{ t, }[/math] and the random control action [math]\displaystyle{ \alpha(t) }[/math] chosen on slot [math]\displaystyle{ t }[/math] after observing [math]\displaystyle{ \omega(t) }[/math]. Such a policy [math]\displaystyle{ \alpha^*(t) }[/math] can be shown to exist whenever the desired control problem is feasible and the event space for [math]\displaystyle{ \omega(t) }[/math] and action space for [math]\displaystyle{ \alpha(t) }[/math] are finite, or when mild closure properties are satisfied.[5]

The expectations above are with respect to the random variable \omega(t) for slot t, and the random control action \alpha(t) chosen on slot t after observing \omega(t). Such a policy \alpha^*(t) can be shown to exist whenever the desired control problem is feasible and the event space for \omega(t) and action space for \alpha(t) are finite, or when mild closure properties are satisfied.

上述期望是关于槽 t 的随机变量 ω (t) ,以及观察到 ω (t)后在槽 t 上选择的随机控制作用 α (t)。只要所需的控制问题是可行的,并且 ω (t)的事件空间和 alpha (t)的动作空间是有限的,或者只要满足轻度闭包性质,就可以证明这样的策略 alpha ^ * (t)存在。

Let [math]\displaystyle{ \alpha(t) }[/math] represent the action taken by a C-additive approximation of the drift-plus-penalty algorithm of the previous section, for some non-negative constant C. To simplify terminology, we call this action the drift-plus-penalty action, rather than the C-additive approximate drift-plus-penalty action. Let [math]\displaystyle{ \alpha^*(t) }[/math] represent the [math]\displaystyle{ \omega }[/math]-only decision:

Let \alpha(t) represent the action taken by a C-additive approximation of the drift-plus-penalty algorithm of the previous section, for some non-negative constant C. To simplify terminology, we call this action the drift-plus-penalty action, rather than the C-additive approximate drift-plus-penalty action. Let \alpha^*(t) represent the \omega-only decision:

让 alpha (t)表示上一节中漂移加罚算法的 C 加近似所采取的行动,对于一些非负常数 C,为了简化术语,我们称这种行动为漂移加罚行动,而不是 C 加近似的漂移加罚行动。让 alpha ^ * (t)表示仅仅是 ω 的决定:

[math]\displaystyle{ \alpha(t) = \text{drift-plus-penalty action for slot } t }[/math]
[math]\displaystyle{ \alpha^*(t) = \omega\text{-only action that satisfies (Eq.3)-(Eq.4)} }[/math]
\alpha(t) = \text{drift-plus-penalty action for slot } t
\alpha^*(t) = \omega\text{-only action that satisfies (Eq.3)-(Eq.4)}
alpha (t) = text (插槽的漂移加罚动作) t: alpha ^
  • (t) = omega text (仅满足(方程3)-(方程4)}的动作)

Assume the drift-plus-penalty action [math]\displaystyle{ \alpha(t) }[/math] is used on each and every slot. By (Eq. 2), the drift-plus-penalty expression under this [math]\displaystyle{ \alpha(t) }[/math] action satisfies the following for each slot [math]\displaystyle{ t: }[/math]

Assume the drift-plus-penalty action \alpha(t) is used on each and every slot. By (Eq. 2), the drift-plus-penalty expression under this \alpha(t) action satisfies the following for each slot t:

假设在每个槽上使用漂移加罚作用 alpha (t)。方程式。2) ,这个 alpha (t)作用下的漂移加罚表达式对每个时隙 t 满足以下条件:

[math]\displaystyle{ \begin{align} \Delta L(t) + Vp(t) &\leqslant B + Vp(t) + \sum_{i=1}^KQ_i(t)y_i(t) \\ &= B + VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\ &\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) \end{align} }[/math]
\begin{align}

\Delta L(t) + Vp(t) &\leqslant B + Vp(t) + \sum_{i=1}^KQ_i(t)y_i(t) \\ &= B + VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\ &\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) \end{align}

\begin{align}

\Delta L(t) + Vp(t) &\leqslant B + Vp(t) + \sum_{i=1}^KQ_i(t)y_i(t) \\ &= B + VP(\alpha(t), \omega(t)) + \sum_{i=1}^K Q_i(t)Y_i(\alpha(t), \omega(t)) \\ &\leqslant B + C + VP(\alpha^

  • (t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^
  • (t), \omega(t))

\end{align}

where the last inequality follows because action [math]\displaystyle{ \alpha(t) }[/math] comes within an additive constant [math]\displaystyle{ C }[/math] of minimizing the preceding expression over all other actions in the set [math]\displaystyle{ A, }[/math] including [math]\displaystyle{ \alpha^*(t). }[/math] Taking expectations of the above inequality gives:

where the last inequality follows because action \alpha(t) comes within an additive constant C of minimizing the preceding expression over all other actions in the set A, including \alpha^*(t). Taking expectations of the above inequality gives:

其中最后一个不等式紧随其后,因为作用 α (t)在一个加性常数 C 中,它使集合 A 中所有其他作用(包括 α ^ * (t))的前面的表达式最小化。对上述不平等的预期给出:

[math]\displaystyle{ \begin{align} E[\Delta(t) + Vp(t)] &\leqslant B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^K E \left [Q_i(t)Y_i(\alpha^*(t), \omega(t)) \right ] \\ & = B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^KE[Q_i(t)]E[Y_i(\alpha^*(t), \omega(t))] && \alpha^*(t), \omega(t) \text{ are independent of } Q_i(t) \\ &\leqslant B + C + Vp^* && \text{Using Eq. 3 and Eq. 4} \end{align} }[/math]
\begin{align}

E[\Delta(t) + Vp(t)] &\leqslant B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^K E \left [Q_i(t)Y_i(\alpha^*(t), \omega(t)) \right ] \\ & = B + C + VE[P(\alpha^*(t), \omega(t))] + \sum_{i=1}^KE[Q_i(t)]E[Y_i(\alpha^*(t), \omega(t))] && \alpha^*(t), \omega(t) \text{ are independent of } Q_i(t) \\ &\leqslant B + C + Vp^* && \text{Using Eq. 3 and Eq. 4} \end{align}

开始{ lig} E [ Delta (t) + Vp (t)] & leqslant B + C + VE [ P (alpha ^

  • (t) ,omega (t))] + sum _ { i = 1} ^ KE left [ Q _ i (t) Y _ i (alpha ^
  • (t) ,omega (t)) right ] & = B + C + VE [ P (alpha ^
  • (t) ,omega (t))] + sum _{ i = 1}KE [ Q _ i (t)] E [ Y _ i (alpha ^
  • (t) ,omega (t))] & & alpha ^
  • (t) ,ω (t) text {独立于} Q _ i (t) & leqslant B + C + Vp ^
  • & & text { Use Eq。3和等式。4}

\end{align}

Notice that the [math]\displaystyle{ \alpha^*(t) }[/math] action was never actually implemented. Its existence was used only for comparison purposes to reach the final inequality. Summing the above inequality over the first [math]\displaystyle{ t \gt 0 }[/math] slots gives:

Notice that the \alpha^*(t) action was never actually implemented. Its existence was used only for comparison purposes to reach the final inequality. Summing the above inequality over the first t > 0 slots gives:

注意,alpha ^ * (t)操作实际上从未实现过。它的存在仅仅用于比较目的,以达到最终的不平等。将上面的不等式在第一个 t > 0槽上加起来得到:

[math]\displaystyle{ \begin{align} (B+C+Vp^*)t &\geqslant \sum_{\tau=0}^{t-1} E[\Delta(\tau) + Vp(\tau)] \\ &= E[L(t)] - E[L(0)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \Delta(\tau) = L(\tau+1)-L(\tau) \\ &= E[L(t)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \text{assume } L(0) = 0 \\ &\geqslant V\sum_{\tau=0}^{t-1}E[p(\tau)] && L(t) \geqslant 0 \end{align} }[/math]
\begin{align}

(B+C+Vp^*)t &\geqslant \sum_{\tau=0}^{t-1} E[\Delta(\tau) + Vp(\tau)] \\ &= E[L(t)] - E[L(0)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \Delta(\tau) = L(\tau+1)-L(\tau) \\ &= E[L(t)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \text{assume } L(0) = 0 \\ &\geqslant V\sum_{\tau=0}^{t-1}E[p(\tau)] && L(t) \geqslant 0 \end{align}

\begin{align}

(B+C+Vp^

  • )t &\geqslant \sum_{\tau=0}^{t-1} E[\Delta(\tau) + Vp(\tau)] \\

&= E[L(t)] - E[L(0)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \Delta(\tau) = L(\tau+1)-L(\tau) \\ &= E[L(t)] + V\sum_{\tau=0}^{t-1}E[p(\tau)] && \text{assume } L(0) = 0 \\ &\geqslant V\sum_{\tau=0}^{t-1}E[p(\tau)] && L(t) \geqslant 0 \end{align}

Dividing the above by [math]\displaystyle{ Vt }[/math] yields the following result, which holds for all slots [math]\displaystyle{ t \gt 0: }[/math]

Dividing the above by Vt yields the following result, which holds for all slots t > 0:

将上面的值除以 Vt 会得到下面的结果,它适用于所有时隙 t > 0的情况:

[math]\displaystyle{ \frac{1}{t}\sum_{\tau=0}^{t-1} E[p(\tau)] \leqslant p^* + \frac{B+C}{V}. }[/math]
\frac{1}{t}\sum_{\tau=0}^{t-1} E[p(\tau)] \leqslant p^* + \frac{B+C}{V}.

{ t } sum _ { tau = 0} ^ { t-1} E [ p (tau)]等式 p ^

  • + frac { B + C }{ V }。

Thus, the time average expected penalty can be made arbitrarily close to the optimal value [math]\displaystyle{ p^* }[/math] by choosing [math]\displaystyle{ V }[/math] suitably large. It can be shown that all virtual queues are mean rate stable, and so all desired constraints are satisfied.[5] The parameter [math]\displaystyle{ V }[/math] affects the size of the queues, which determines the speed at which the time average constraint functions converge to a non-positive number. A more detailed analysis on the size of the queues is given in the next subsection.

Thus, the time average expected penalty can be made arbitrarily close to the optimal value p^* by choosing V suitably large. It can be shown that all virtual queues are mean rate stable, and so all desired constraints are satisfied. The parameter V affects the size of the queues, which determines the speed at which the time average constraint functions converge to a non-positive number. A more detailed analysis on the size of the queues is given in the next subsection.

因此,通过选择适当大的 V,时间平均期望惩罚可以任意地接近最优值 p ^ * 。结果表明,所有的虚拟队列都是平均速率稳定的,因此满足所有期望的约束条件。参数 V 影响队列的大小,这决定了时间平均约束函数收敛到非正数的速度。下一小节将对队列的大小进行更详细的分析。

Average queue size analysis

Assume now there exists an [math]\displaystyle{ \omega }[/math]-only policy [math]\displaystyle{ \alpha^*(t) }[/math], possibly different from the one that satisfies (Eq. 3)-(Eq.4), that satisfies the following for some [math]\displaystyle{ \epsilon\gt 0 }[/math]:

Assume now there exists an \omega-only policy \alpha^*(t), possibly different from the one that satisfies (Eq. 3)-(Eq.4), that satisfies the following for some \epsilon>0:

= = = 平均队列大小分析 = = = 假设现在存在一个仅 Ω 策略 alpha ^ * (t) ,可能不同于满足(Eq。3)-(等式4) ,对于某个 ε > 0满足以下条件:

[math]\displaystyle{ (\text{Eq. } 5) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leq -\epsilon \qquad \forall i \in \{1, \ldots , K\} }[/math]
(\text{Eq. } 5) \qquad E[Y_i(\alpha^*(t), \omega(t))] \leq -\epsilon \qquad \forall i \in \{1, \ldots , K\}
(text { Eq. }5) qquad E [ Y _ i (alpha ^
  • (t) ,omega (t))] leq-epsilon qquad for all i in {1,ldot,K }

An argument similar to the one in the previous section shows:

An argument similar to the one in the previous section shows:

一个类似于上一节的论点表明:

[math]\displaystyle{ \begin{align} \Delta(t) + Vp(t) &\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) \\ \Delta(t) + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) && \text{assume } p_{\min} \leqslant P \leqslant p_{\max} \\ E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^K E \left [Q_i(t)] E[Y_i(\alpha^*(t), \omega(t)) \right ] && \text{taking expectation} \\ E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KE[Q_i(t)](-\epsilon) && \text{Using (Eq. 5)} \\ E[\Delta(t)] + \epsilon \sum_{i=1}^KE[Q_i(t)] &\leqslant B + C + V(p_{\max} - p_{\min}) \end{align} }[/math]
\begin{align}

\Delta(t) + Vp(t) &\leqslant B + C + VP(\alpha^*(t), \omega(t)) + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) \\ \Delta(t) + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KQ_i(t)Y_i(\alpha^*(t), \omega(t)) && \text{assume } p_{\min} \leqslant P \leqslant p_{\max} \\ E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^K E \left [Q_i(t)] E[Y_i(\alpha^*(t), \omega(t)) \right ] && \text{taking expectation} \\ E[\Delta(t)] + Vp_{\min} &\leqslant B + C + Vp_{\max} + \sum_{i=1}^KE[Q_i(t)](-\epsilon) && \text{Using (Eq. 5)} \\ E[\Delta(t)] + \epsilon \sum_{i=1}^KE[Q_i(t)] &\leqslant B + C + V(p_{\max} - p_{\min}) \end{align}

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _{ i = 1} ^ KQ _ i (t) Y _ i(a ^

  • (t) ,omega (t)) & & text (假设} p _ { min }等式 P 等式 p _ { max } E [ Delta (t)] + Vp _ { min } & leqslant B + C + Vp _ { max } + sum _ { i = 1} ^ KE left [ Q _ i (t)] E [ Y _ i (alpha ^
  • (t)) ,ω (t))右] & & text { taking  期望} E [ Delta (t)] + Vp _ { min } & leqslant B + C + Vp _ { max } + sum _ { i = 1} ^ KE [ Q _ i (t)](- epsilon)(- epsilon) & & 文本{使用(Eq。5)} \\

E[\Delta(t)] + \epsilon \sum_{i=1}^KE[Q_i(t)] &\leqslant B + C + V(p_{\max} - p_{\min}) \end{align}

A telescoping series argument similar to the one in the previous section can thus be used to show the following for all t>0:[5]

A telescoping series argument similar to the one in the previous section can thus be used to show the following for all t>0:

因此,一个类似于上一节的裂项和参数可以用来显示所有 t > 0的下面内容:

[math]\displaystyle{ \frac{1}{t}\sum_{\tau=0}^{t-1} \sum_{i=1}^KE[Q_i(\tau)] \leqslant \frac{B + C + V(p_{\max} - p_{\min})}{\epsilon} }[/math]
\frac{1}{t}\sum_{\tau=0}^{t-1} \sum_{i=1}^KE[Q_i(\tau)] \leqslant \frac{B + C + V(p_{\max} - p_{\min})}{\epsilon}

{ t } sum _ { tau = 0} ^ { t-1} sum _ { i = 1} ^ KE [ Q _ i (tau)]等分法{ B + C + V (p _ { max }-p _ { min })}{ epsilon }

This shows that average queue size is indeed O(V).

This shows that average queue size is indeed O(V).

这表明平均队列大小确实是 O (V)。

Probability 1 convergence

The above analysis considers time average expectations. Related probability 1 performance bounds for infinite horizon time average queue size and penalty can be derived using the drift-plus-penalty method together with martingale theory.[14]

The above analysis considers time average expectations. Related probability 1 performance bounds for infinite horizon time average queue size and penalty can be derived using the drift-plus-penalty method together with martingale theory.M. J. Neely, "Queue Stability and Probability 1 Convergence via Lyapunov Optimization," Journal of Applied Mathematics, vol. 2012, .

= = 概率1收敛 = = = 以上分析考虑时间平均预期。利用漂移加罚方法,结合鞅理论,得到了无限时域平均队长和罚金的相关概率1性能界。尼利,“排队稳定性和通过 Lyapunov最佳化收敛的概率1”,《应用数学杂志》 ,卷。2012, .

Application to queues with finite capacity

As shown, the drift-plus-penalty allows to keep the average queue size under a certain threshold, which depends on the choice of the parameter V, but in general, does not offer any guarantee on the maximum queue occupancy. However, if the action set respects certain constraints, it is possible to add an additional condition on the choice of V to enforce the maximum length of a queue and thus to apply the algorithm also to queues with finite capacity.[15]

As shown, the drift-plus-penalty allows to keep the average queue size under a certain threshold, which depends on the choice of the parameter V, but in general, does not offer any guarantee on the maximum queue occupancy. However, if the action set respects certain constraints, it is possible to add an additional condition on the choice of V to enforce the maximum length of a queue and thus to apply the algorithm also to queues with finite capacity.L. Bracciale, P. Loreti "Lyapunov drift-plus-penalty optimization for queues with finite capacity" IEEE Communications Letters, .

如图所示,漂移加罚则允许队列的平均大小保持在一定的阈值以下,这取决于参数 V 的选择,但一般来说,不能保证队列的最大占有率。然而,如果动作集尊重某些约束条件,则可以在选择 V 时添加一个附加条件,以强制执行队列的最大长度,从而将该算法也应用于有限容量的队列。“有限容量队列的李亚普诺夫漂移加罚优化”IEEE通信快报,。

Treatment of queueing systems

The above analysis considers constrained optimization of time averages in a stochastic system that did not have any explicit queues. Each time average inequality constraint was mapped to a virtual queue according to (Eq. 1). In the case of optimizing a queueing network, the virtual queue equations in (Eq. 1) are replaced by the actual queueing equations.

The above analysis considers constrained optimization of time averages in a stochastic system that did not have any explicit queues. Each time average inequality constraint was mapped to a virtual queue according to (Eq. 1). In the case of optimizing a queueing network, the virtual queue equations in (Eq. 1) are replaced by the actual queueing equations.

排队系统的处理上述分析考虑了在没有任何显式队列的随机系统中时间平均的约束优化。每次平均不等式约束映射到一个虚拟队列根据(Eq。1).在优化排队网络的情况下,在(Eq。1)被实际排队方程所代替。

Convex functions of time averages

A related problem is the minimization of a convex function of time averages subject to constraints, such as:

A related problem is the minimization of a convex function of time averages subject to constraints, such as:

= = = 时间平均的凸函数 = = = 一个相关的问题是时间平均的凸函数受约束的最小化,例如:

[math]\displaystyle{ \text{Minimize} \quad f \left (\overline{y}_1, \ldots, \overline{y}_K \right) \quad \text{subject to} \quad g_i \left (\overline{y}_1, \ldots, \overline{y}_K \right ) \leqslant 0 \qquad \forall i \in \{1, \ldots, N\} }[/math]
\text{Minimize} \quad f \left (\overline{y}_1, \ldots, \overline{y}_K \right) \quad \text{subject to} \quad g_i \left (\overline{y}_1, \ldots, \overline{y}_K \right ) \leqslant 0 \qquad \forall i \in \{1, \ldots, N\}
text { Minimize } quad f left (overline { y } _ 1,ldot,overline { y } _ K right) quad text { subject to } quad g _ i left (overline { y } _ 1,ldot,overline { y } _ K right) leqslant 0 qquad for all i in {1,ldot,N }

where [math]\displaystyle{ f }[/math] and [math]\displaystyle{ g_i }[/math] are convex functions, and where the time averages are defined:

where f and g_i are convex functions, and where the time averages are defined:

其中 f 和 g _ i 是凸函数,定义了时间平均值:

[math]\displaystyle{ \overline{y}_i = \lim_{t\to\infty} \frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)] }[/math]
\overline{y}_i = \lim_{t\to\infty} \frac{1}{t}\sum_{\tau=0}^{t-1} E[y_i(\tau)]

{ y } _ i = lim _ { t to infty } frac {1}{ t } sum _ { tau = 0} ^ { t-1} E [ y _ i (tau)]

Such problems of optimizing convex functions of time averages can be transformed into problems of optimizing time averages of functions via auxiliary variables (see Chapter 5 of the Neely textbook).[2][5] The latter problems can then be solved by the drift-plus-penalty method as described in previous subsections. An alternative primal-dual method makes decisions similar to drift-plus-penalty decisions, but uses a penalty defined by partial derivatives of the objective function [math]\displaystyle{ f. }[/math][5][16][17] The primal-dual approach can also be used to find local optima in cases when [math]\displaystyle{ f }[/math] is non-convex.[5]

Such problems of optimizing convex functions of time averages can be transformed into problems of optimizing time averages of functions via auxiliary variables (see Chapter 5 of the Neely textbook). The latter problems can then be solved by the drift-plus-penalty method as described in previous subsections. An alternative primal-dual method makes decisions similar to drift-plus-penalty decisions, but uses a penalty defined by partial derivatives of the objective function f.A. Stolyar,"Maximizing Queueing Network Utility subject to Stability: Greedy Primal-Dual Algorithm," Queueing Systems, vol. 50, no. 4, pp. 401–457, 2005.A. Stolyar, "Greedy Primal-Dual Algorithm for Dynamic Resource Allocation in Complex Networks," Queueing Systems, vol. 54, no. 3, pp. 203–220, 2006. The primal-dual approach can also be used to find local optima in cases when f is non-convex.

这种优化时间平均凸函数的问题可以转化为通过辅助变量优化函数的时间平均值的问题(见 Neely 教科书第5章)。然后,后一个问题可以用漂移加惩罚方法来解决,如前面小节所述。另一种原始-对偶方法使决策类似于漂移加惩罚决策,但使用目标函数 f.A 的偏导数定义惩罚。最大化服从稳定的排队网络效用: 贪婪的原始-对偶算法,排队系统,卷。五十,不行。4页。401-457,2005.贪婪的原始-对偶算法在复杂网络中的动态资源分配,排队系统,卷。54号,不。3页。203–220, 2006.在 f 非凸的情况下,原始-对偶方法也可以用来寻找局部最优解。

Delay tradeoffs and related work

Delay tradeoffs and related work

= 延迟权衡和相关工作 =

The mathematical analysis in the previous section shows that the drift-plus-penalty method produces a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size. This method, together with the O(1/V), O(V) tradeoff, was developed in Neely[9] and Neely, Modiano, Li [2] in the context of maximizing network utility subject to stability.

The mathematical analysis in the previous section shows that the drift-plus-penalty method produces a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size. This method, together with the O(1/V), O(V) tradeoff, was developed in Neely and Neely, Modiano, Li

in the context of maximizing network utility subject to stability.

前面的数学分析表明,漂移加惩罚方法产生的时间平均惩罚在 O (1/V)的最优性范围内,在平均队列大小上有相应的 O (V)折衷。这种方法,连同 O (1/V) ,O (V)折衷,是在尼利和尼利,莫迪亚诺,李的背景下发展的最大化网络效用受到稳定性。

A related algorithm for maximizing network utility was developed by Eryilmaz and Srikant. [18] The Eryilmaz and Srikant work resulted in an algorithm very similar to the drift-plus-penalty algorithm, but used a different analytical technique. That technique was based on Lagrange multipliers. A direct use of the Lagrange multiplier technique results in a worse tradeoff of O(1/V), O(V2). However, the Lagrange multiplier analysis was later strengthened by Huang and Neely to recover the original O(1/V), O(V) tradeoffs, while showing that queue sizes are tightly clustered around the Lagrange multiplier of a corresponding deterministic optimization problem. [19] This clustering result can be used to modify the drift-plus-penalty algorithm to enable improved O(1/V), O(log2(V)) tradeoffs. The modifications can use either place-holder backlog[19] or Last-in-First-Out (LIFO) scheduling. [20] [21]

A related algorithm for maximizing network utility was developed by Eryilmaz and Srikant.

A. Eryilmaz and R. Srikant, "Fair Resource Allocation in Wireless Networks using Queue-Length-Based Scheduling and Congestion Control," Proc. IEEE INFOCOM, March 2005.

The Eryilmaz and Srikant work resulted in an algorithm very similar to the drift-plus-penalty algorithm, but used a different analytical technique. That technique was based on Lagrange multipliers. A direct use of the Lagrange multiplier technique results in a worse tradeoff of O(1/V), O(V2). However, the Lagrange multiplier analysis was later strengthened by Huang and Neely to recover the original O(1/V), O(V) tradeoffs, while showing that queue sizes are tightly clustered around the Lagrange multiplier of a corresponding deterministic optimization problem.

L. Huang and M. J. Neely, "Delay Reduction via Lagrange Multipliers in Stochastic Network Optimization," IEEE Trans. on Automatic Control, vol. 56, no. 4, pp. 842–857, April 2011.

This clustering result can be used to modify the drift-plus-penalty algorithm to enable improved O(1/V), O(log2(V)) tradeoffs. The modifications can use either place-holder backlog or Last-in-First-Out (LIFO) scheduling.

S. Moeller, A. Sridharan, B. Krishnamachari, and O. Gnawali, "Routing without Routes: The Backpressure Collection Protocol," Proc. IPSN 2010.


L. Huang, S. Moeller, M. J. Neely, and B. Krishnamachari, "LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff," IEEE/ACM Transactions on Networking, to appear.


由 Eryilmaz 和 Srikant 开发了一个相关的网络效用最大化算法。无线网络中使用基于队列长度的调度和拥塞控制的公平资源分配。IEEE INFOCOM,2005年3月。Eryilmaz 和 Srikant 的工作产生了一个与漂移加罚算法非常相似的算法,但是使用了不同的分析技术。这种方法是基于拉格兰奇乘数的。如果直接使用拉格朗日乘数技术,则 O (1/V)、 O (V2)的折衷效果会更差。然而,拉格朗日乘数分析后来得到了 Huang 和 Neely 的加强,以恢复最初的 O (1/V) ,O (V)折衷,同时显示队列大小紧密地聚集在相应的确定性拉格朗日乘数最佳化问题的周围。黄和尼利,〈随机网络优化中基于拉格朗日乘子的延迟降低〉 ,IEEE Trans。自动控制,第一卷。56号,不。4页。842-857,2011年4月。该聚类结果可用于改进漂移加惩罚算法,以实现改进的 O (1/V)、 O (log2(V))权衡。修改可以使用位置持有者待办事项列表(place-holder backlog) ,也可以使用后进先出(Last-in-First-Out,LIFO)调度。Moeller,A. Sridharan,B. Krishnamachari,and O. Gnawali,“没有路由的路由: 背压收集协议”,Proc。IPSN 2010.黄,S。 Moeller,M。 Neely,和 B。 Krishnamachari,“ LIFO 反压力达到接近最优效用-延迟权衡”,IEEE/ACM 关于网络的会议,出版。

When implemented for non-stochastic functions, the drift-plus-penalty method is similar to the dual subgradient method of convex optimization theory, with the exception that its output is a time average of primal variables, rather than the primal variables themselves.[4][6] A related primal-dual technique for maximizing utility in a stochastic queueing network was developed by Stolyar using a fluid model analysis. [16] [17] The Stolyar analysis does not provide analytical results for a performance tradeoff between utility and queue size. A later analysis of the primal-dual method for stochastic networks proves a similar O(1/V), O(V) utility and queue size tradeoff, and also shows local optimality results for minimizing non-convex functions of time averages, under an additional convergence assumption.[5] However, this analysis does not specify how much time is required for the time averages to converge to something close to their infinite horizon limits. Related primal-dual algorithms for utility maximization without queues were developed by Agrawal and Subramanian [22] and Kushner and Whiting. [23]

When implemented for non-stochastic functions, the drift-plus-penalty method is similar to the dual subgradient method of convex optimization theory, with the exception that its output is a time average of primal variables, rather than the primal variables themselves. A related primal-dual technique for maximizing utility in a stochastic queueing network was developed by Stolyar using a fluid model analysis.


The Stolyar analysis does not provide analytical results for a performance tradeoff between utility and queue size. A later analysis of the primal-dual method for stochastic networks proves a similar O(1/V), O(V) utility and queue size tradeoff, and also shows local optimality results for minimizing non-convex functions of time averages, under an additional convergence assumption. However, this analysis does not specify how much time is required for the time averages to converge to something close to their infinite horizon limits. Related primal-dual algorithms for utility maximization without queues were developed by Agrawal and Subramanian

R. Agrawal and V. Subramanian, "Optimality of certain channel aware scheduling policies," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.

and Kushner and Whiting.

H. Kushner and P. Whiting, "Asymptotic Properties of Proportional-Fair Sharing Algorithms," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.


当用于非随机函数时,漂移加惩罚方法类似于次梯度法理论的对偶凸优化,除了它的输出是原始变量的时间平均值,而不是原始变量本身。通过流体模型分析,Stolyar 提出了随机排队网络中效用最大化的原始-对偶技术。Stolyar 分析没有提供效用和队列大小之间性能折衷的分析结果。随后对随机网络的原始对偶方法进行了分析,证明了在附加收敛假设下,随机网络具有相似的 O (1/V)、 O (V)效用和队长折衷,并给出了最小化时间平均非凸函数的局部最优性结果。然而,这种分析并没有说明需要多少时间才能使时间平均值收敛到接近它们的无限水平极限的程度。Agrawal 和 Subramanian R。 Agrawal 和 V。 Subramanian 开发了无队列效用最大化的相关原始-对偶算法,“某些通道感知调度策略的最优性”,Proc。第40届阿勒顿年会。《通信、控制与计算》 ,蒙蒂塞洛,伊利诺伊州,2002年10月,库什纳和怀廷。库什纳和怀廷,“比例公平分享算法的渐近性质,”。第40届阿勒顿年会。《通信、控制与计算》 ,蒙蒂塞洛,伊利诺伊州,2002年10月。

Extensions to non-i.i.d. event processes

Extensions to non-i.i.d. event processes

= 非 i.d. 的扩展。事件进程 =

The drift-plus-penalty algorithm is known to ensure similar performance guarantees for more general ergodic processes [math]\displaystyle{ \omega(t) }[/math], so that the i.i.d. assumption is not crucial to the analysis. The algorithm can be shown to be robust to non-ergodic changes in the probabilities for [math]\displaystyle{ \omega(t) }[/math]. In certain scenarios, it can be shown to provide desirable analytical guarantees, called universal scheduling guarantees, for arbitrary [math]\displaystyle{ \omega(t) }[/math] processes.[5]

The drift-plus-penalty algorithm is known to ensure similar performance guarantees for more general ergodic processes \omega(t), so that the i.i.d. assumption is not crucial to the analysis. The algorithm can be shown to be robust to non-ergodic changes in the probabilities for \omega(t). In certain scenarios, it can be shown to provide desirable analytical guarantees, called universal scheduling guarantees, for arbitrary \omega(t) processes.

众所周知,漂移加惩罚算法能够保证更一般的遍历过程 ω (t)具有相似的性能保证,因此,该算法能够保证迭代过程的性能。假设对分析并不重要。该算法对 ω (t)概率的非遍历变化具有鲁棒性。在某些情况下,它可以被证明为任意 ω (t)过程提供理想的分析保证,称为通用调度保证。

Extensions to variable frame length systems

Extensions to variable frame length systems

可变帧长系统的扩展

The drift-plus-penalty method can be extended to treat systems that operate over variable size frames.[24] [25] In that case, the frames are labeled with indices r in {0, 1, 2, ...} and the frame durations are denoted {T[0], T[1], T[2], ...}, where T[r] is a non-negative real number for each frame r. Let [math]\displaystyle{ \Delta[r] }[/math] and [math]\displaystyle{ p[r] }[/math] be the drift between frame r and r + 1, and the total penalty incurred during frame r, respectively. The extended algorithm takes a control action over each frame r to minimize a bound on the following ratio of conditional expectations:

The drift-plus-penalty method can be extended to treat systems that operate over variable size frames. C. Li and M. J. Neely, "Network utility maximization over partially observable Markovian channels," Performance Evaluation, https://dx.doi.org/10.1016/j.peva.2012.10.003.


M. J. Neely, "Dynamic Optimization and Learning for Renewal Systems," IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 32–46, Jan. 2013.

In that case, the frames are labeled with indices r in {0, 1, 2, ...} and the frame durations are denoted {T[0], T[1], T[2], ...}, where T[r] is a non-negative real number for each frame r. Let \Delta[r] and p[r] be the drift between frame r and r + 1, and the total penalty incurred during frame r, respectively. The extended algorithm takes a control action over each frame r to minimize a bound on the following ratio of conditional expectations:

漂移加罚方法可以推广到处理在可变尺寸帧上运行的系统。“部分可观察马尔可夫通道上的网络效用最大化”,性能评估, https://dx.doi.org/10.1016/J.peva.2012.10.003。更新系统的动态优化与学习》 ,IEEE 自动控制学报,卷。58不。1页。2013年1月32日至46日。在这种情况下,帧在{0,1,2,... }中用指数 r 标记,帧的持续时间用{ T [0] ,T [1] ,T [2] ,... }表示,其中 T [ r ]是每个帧 r 的非负实数 r 让 Delta [ r ]和 p [ r ]分别是帧 r 和 r + 1之间的漂移,以及在帧 r 中产生的总惩罚。扩展算法在每个帧 r 上采取控制行动,以最小化下列条件期望值比率的界限:

[math]\displaystyle{ \frac{E[\Delta[r] + Vp[r] \mid Q[r]]}{E[T[r] \mid Q[r]]} }[/math]

\frac{E[\Delta[r] + Vp[r] \mid Q[r]]}{E[T[r] \mid Q[r]]}


\frac{E[\Delta[r] + Vp[r] \mid Q[r]]}{E[T[r] \mid Q[r]]}

where Q[r] is the vector of queue backlogs at the beginning of frame r. In the special case when all frames are the same size and are normalized to 1 slot length, so that T[r] = 1 for all r, the above minimization reduces to the standard drift-plus-penalty technique. This frame-based method can be used for constrained optimization of Markov decision problems (MDPs) and for other problems involving systems that experience renewals. [24] [25]

where Q[r] is the vector of queue backlogs at the beginning of frame r. In the special case when all frames are the same size and are normalized to 1 slot length, so that T[r] = 1 for all r, the above minimization reduces to the standard drift-plus-penalty technique. This frame-based method can be used for constrained optimization of Markov decision problems (MDPs) and for other problems involving systems that experience renewals.


其中 Q [ r ]是帧 r 开始处队列积压的向量。在所有帧大小相同且归一化为1槽长度的特殊情况下,T [ r ] = 1对于所有 r,上述最小化减少到标准的漂移加罚技术。这种基于框架的方法可以用于马尔可夫决策问题(MDP)的约束优化和其他涉及系统更新的问题。

Application to convex programming

Application to convex programming

= 凸规划的应用 =

Let x = (x1, ..., xN) be an N-dimensional vector of real numbers, and define the hyper-rectangle A by:

Let x = (x1, ..., xN) be an N-dimensional vector of real numbers, and define the hyper-rectangle A by:

设 x = (x1,... ,xN)是实数的 N 维向量,并通过以下方法定义超矩形 A:

[math]\displaystyle{ A = \{(x_1, x_2, \ldots, x_N) \mid x_{\min,i} \leq x_i \leq x_{\max,i} \text{ } \forall i \in \{1, \ldots, N\}\} }[/math]

A = \{(x_1, x_2, \ldots, x_N) \mid x_{\min,i} \leq x_i \leq x_{\max,i} \text{ } \forall i \in \{1, \ldots, N\}\}


A = \{(x_1, x_2, \ldots, x_N) \mid x_{\min,i} \leq x_i \leq x_{\max,i} \text{ } \forall i \in \{1, \ldots, N\}\}

where xmin, i, xmax, i are given real numbers that satisfy [math]\displaystyle{ x_{\min,i} \lt x_{\max, i} }[/math] for all i. Let P(x) and [math]\displaystyle{ Y_i(x) }[/math] for i in {1, ..., K} be continuous and convex functions of the x vector over all x in A. Consider the following convex programming problem:

where xmin, i, xmax, i are given real numbers that satisfy x_{\min,i} < x_{\max, i} for all i. Let P(x) and Y_i(x) for i in {1, ..., K} be continuous and convex functions of the x vector over all x in A. Consider the following convex programming problem:

其中 xmin,i,xmax,i 给出了满足所有 i 的 x _ { min,i } < x _ { max,i }的实数。设{1,... ,K }中 i 的 P (x)和 Y _ i (x)是 A 中所有 x 上的 x 向量的连续凸函数。考虑下面的凸规划问题:

[math]\displaystyle{ (\text{Eq. } 6) \text{ } \text{Minimize: } P(x) }[/math]

(\text{Eq. } 6) \text{ } \text{Minimize: } P(x)


(\text{Eq. }6)文本{}文本{最小化: } P (x)

[math]\displaystyle{ (\text{Eq. } 7) \text{ } \text{Subject to: } Y_i(x) \leq 0 \text{ } \forall i \in \{1, \ldots, K\} \text{ }, \text{ } x = (x_1, \ldots, x_N) \in A }[/math]

(\text{Eq. } 7) \text{ } \text{Subject to: } Y_i(x) \leq 0 \text{ } \forall i \in \{1, \ldots, K\} \text{ }, \text{ } x = (x_1, \ldots, x_N) \in A


(\text{Eq. }7) text {} text { Subject to: } Y _ i (x) leq 0 text {} for all i in {1,ldot,K } text {} ,text {} x = (x _ 1,ldot,x _ N) in A

This can be solved by the drift-plus-penalty method as follows: Consider the special case of a deterministic system with no random event process [math]\displaystyle{ \omega(t) }[/math]. Define the control action [math]\displaystyle{ \alpha(t) }[/math] as:

This can be solved by the drift-plus-penalty method as follows: Consider the special case of a deterministic system with no random event process \omega(t). Define the control action \alpha(t) as:

这可以通过漂移加惩罚的方法解决如下问题: 考虑一个没有随机事件过程 omega (t)的确定性模型的特殊情况。将控制操作 alpha (t)定义为:

[math]\displaystyle{ \alpha(t) = x(t) = (x_1(t), x_2(t), \ldots, x_N(t)) }[/math]

\alpha(t) = x(t) = (x_1(t), x_2(t), \ldots, x_N(t))


\alpha(t) = x(t) = (x_1(t), x_2(t), \ldots, x_N(t))

and define the action space as the N-dimensional hyper-rectangle A. Define penalty and constraint functions as:

and define the action space as the N-dimensional hyper-rectangle A. Define penalty and constraint functions as:

并将动作空间定义为 N 维超矩形 A。将惩罚和约束功能定义为:

  • [math]\displaystyle{ p(t) = P(x_1(t), \ldots, x_N(t)) }[/math]

p(t) = P(x_1(t), \ldots, x_N(t))


p(t) = P(x_1(t), \ldots, x_N(t))

  • [math]\displaystyle{ y_i(t) = Y_i(x_1(t), \ldots, x_N(t)) \text{ } \forall i \in \{1, \ldots, K\} }[/math]

y_i(t) = Y_i(x_1(t), \ldots, x_N(t)) \text{ } \forall i \in \{1, \ldots, K\}


y_i(t) = Y_i(x_1(t), \ldots, x_N(t)) \text{ } \forall i \in \{1, \ldots, K\}

Define the following time averages:

Define the following time averages:

定义下列平均时间:

  • [math]\displaystyle{ \overline{x}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1} (x_1(\tau), \ldots, x_N(\tau)) }[/math]

\overline{x}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1} (x_1(\tau), \ldots, x_N(\tau))


\overline{x}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1} (x_1(\tau), \ldots, x_N(\tau))

  • [math]\displaystyle{ \overline{P}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}P(x_1(\tau), \ldots, x_N(\tau)) }[/math]

\overline{P}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}P(x_1(\tau), \ldots, x_N(\tau))


\overline{P}(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}P(x_1(\tau), \ldots, x_N(\tau))

  • [math]\displaystyle{ \overline{Y}_i(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}Y_i(x_1(\tau), \ldots, x_N(\tau)) }[/math]

\overline{Y}_i(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}Y_i(x_1(\tau), \ldots, x_N(\tau))


\overline{Y}_i(t) = \frac{1}{t}\sum_{\tau=0}^{t-1}Y_i(x_1(\tau), \ldots, x_N(\tau))

Now consider the following time average optimization problem:

Now consider the following time average optimization problem:

现在考虑下面的时间平均最佳化问题:

[math]\displaystyle{ (\text{Eq. } 8) \text{ } \text{Minimize: } \lim_{t\rightarrow\infty} \overline{P}(t) }[/math]

(\text{Eq. } 8) \text{ } \text{Minimize: } \lim_{t\rightarrow\infty} \overline{P}(t)


(\text{Eq. }8)文本{}文本{最小化: } lim _ { t right right tarrow infty }重线{ P }(t)

[math]\displaystyle{ (\text{Eq. } 9) \text{ } \text{subject to: } \lim_{t\rightarrow\infty} \overline{Y}_i(t) \leq 0 \text{ } \forall i \in \{1, \ldots, K\} }[/math]

(\text{Eq. } 9) \text{ } \text{subject to: } \lim_{t\rightarrow\infty} \overline{Y}_i(t) \leq 0 \text{ } \forall i \in \{1, \ldots, K\}


(\text{Eq. }9) text {} text { subject to: } lim _ { t right right tarrow infty } overline { Y } _ i (t) leq 0 text {} for all i in {1,ldot,K }

By Jensen's inequality the following holds for all slots t>0:

By Jensen's inequality the following holds for all slots t>0:

通过 Jensen 的不等式,下面的结果适用于所有 t > 0的插槽:

[math]\displaystyle{ P(\overline{x}(t)) \leq \overline{P}(t) \text{ } , \text{ } Y_i(\overline{x}(t)) \leq \overline{Y}_i(t) \text{ } \forall i \in \{1, \ldots, K\} }[/math]

P(\overline{x}(t)) \leq \overline{P}(t) \text{ } , \text{ } Y_i(\overline{x}(t)) \leq \overline{Y}_i(t) \text{ } \forall i \in \{1, \ldots, K\}


P(\overline{x}(t)) \leq \overline{P}(t) \text{ } , \text{ } Y_i(\overline{x}(t)) \leq \overline{Y}_i(t) \text{ } \forall i \in \{1, \ldots, K\}

From this, it can be shown that an optimal solution to the time-average problem (Eq. 8)–(Eq. 9) can be achieved by solutions of the type x(t) = x* for all slots t, where x* is a vector that solves the convex program (Eq. 6)–(Eq. 7). Further, any time-averaged vector [math]\displaystyle{ \lim_{t\rightarrow\infty}\overline{x}(t) }[/math] corresponding to a solution of the time-average problem (Eq. 8)–(Eq. 9) must solve the convex program (Eq. 6)–(Eq. 7). Therefore, the original convex program (Eq. 6)–(Eq. 7) can be solved (to within any desired accuracy) by taking the time average of the decisions made when the drift-plus-penalty algorithm is applied to the corresponding time-averaged problem (Eq. 8)–(Eq. 9). The drift-plus-penalty algorithm for problem (Eq. 8)–(Eq. 9) reduces to the following:

From this, it can be shown that an optimal solution to the time-average problem (Eq. 8)–(Eq. 9) can be achieved by solutions of the type x(t) = x* for all slots t, where x* is a vector that solves the convex program (Eq. 6)–(Eq. 7). Further, any time-averaged vector \lim_{t\rightarrow\infty}\overline{x}(t) corresponding to a solution of the time-average problem (Eq. 8)–(Eq. 9) must solve the convex program (Eq. 6)–(Eq. 7). Therefore, the original convex program (Eq. 6)–(Eq. 7) can be solved (to within any desired accuracy) by taking the time average of the decisions made when the drift-plus-penalty algorithm is applied to the corresponding time-averaged problem (Eq. 8)–(Eq. 9). The drift-plus-penalty algorithm for problem (Eq. 8)–(Eq. 9) reduces to the following:

由此可见,时间平均问题(Eq。8)-(等式。9)可以通过对所有槽 t 的类型 x (t) = x * 的解来实现,其中 x * 是解凸程序的向量(Eq。6)-(等式。7).进一步地,任意时间平均向量 lim _ { t-right-tarrow infty }覆盖{ x }(t)对应于时间平均问题(Eq。8)-(等式。9)必须求解凸规划(方程)。6)-(等式。7).因此,原凸程序(Eq。6)-(等式。7)可以解决(在任何期望的精度内) ,采取时间平均的决策时,漂移加罚算法适用于相应的时间平均问题(方程。8)-(等式。9).问题的漂移加惩罚算法(Eq。8)-(等式。9)缩减为:

Drift-plus-penalty algorithm for convex programming

Every slot t, choose vector [math]\displaystyle{ x(t)=(x_1(t), \ldots, x_N(t)) \in A }[/math] to minimize the expression:

Every slot t, choose vector x(t)=(x_1(t), \ldots, x_N(t)) \in A to minimize the expression:

= = = 凸规划的漂移加罚算法 = = = 每个槽 t,选择 A 中的向量 x (t) = (x _ 1(t) ,ldot,x _ N (t))以使表达式最小化:

[math]\displaystyle{ VP(x(t)) + \sum_{i=1}^K Q_i(t)Y_i(x(t)) }[/math]

VP(x(t)) + \sum_{i=1}^K Q_i(t)Y_i(x(t))


VP(x(t)) + \sum_{i=1}^K Q_i(t)Y_i(x(t))

Then update the queues according to:

Then update the queues according to:

然后根据以下内容更新队列:

[math]\displaystyle{ Q_i(t+1) = \max[Q_i(t) + Y_i(x(t)), 0] \text{ } \forall i \in \{1, \ldots, K\} }[/math]

Q_i(t+1) = \max[Q_i(t) + Y_i(x(t)), 0] \text{ } \forall i \in \{1, \ldots, K\}


Q_i(t+1) = \max[Q_i(t) + Y_i(x(t)), 0] \text{ } \forall i \in \{1, \ldots, K\}

The time average vector [math]\displaystyle{ \overline{x}(t) }[/math] converges to an O(1/V) approximation to the convex program.[6]

The time average vector \overline{x}(t) converges to an O(1/V) approximation to the convex program.

时间平均矢量重线{ x }(t)收敛到凸程序的 O (1/V)近似。

This algorithm is similar to the standard dual subgradient algorithm of optimization theory, using a fixed stepsize of 1/V. [26] However, a key difference is that the dual subgradient algorithm is typically analyzed under restrictive strict convexity assumptions that are needed for the primal variables x(t) to converge. There are many important cases where these variables do not converge to the optimal solution, and never even get near the optimal solution (this is the case for most linear programs, as shown below). On the other hand, the drift-plus-penalty algorithm does not require strict convexity assumptions. It ensures that the time averages of the primals converge to a solution that is within O(1/V) of optimality, with O(V) bounds on queue sizes (it can be shown that this translates into an O(V2) bound on convergence time).[6]

This algorithm is similar to the standard dual subgradient algorithm of optimization theory, using a fixed stepsize of 1/V.

D. P. Bertsekas and A. Nedic and A. E. Ozdaglar. Convex Analysis and Optimization, Boston: Athena Scientific, 2003.

However, a key difference is that the dual subgradient algorithm is typically analyzed under restrictive strict convexity assumptions that are needed for the primal variables x(t) to converge. There are many important cases where these variables do not converge to the optimal solution, and never even get near the optimal solution (this is the case for most linear programs, as shown below). On the other hand, the drift-plus-penalty algorithm does not require strict convexity assumptions. It ensures that the time averages of the primals converge to a solution that is within O(1/V) of optimality, with O(V) bounds on queue sizes (it can be shown that this translates into an O(V2) bound on convergence time).

该算法类似于优化理论中的标准对偶次梯度算法,采用1/V 的固定步长。Bertsekas 和 A.Nedic 还有 A.E. Ozdaglar。凸分析与优化,波士顿: 雅典娜科学出版社,2003。然而,一个关键的区别是,对偶次梯度算法通常是在限制性严格凸性假设下进行分析,这是原始变量 x (t)收敛所必需的。有许多重要的情况下,这些变量不收敛到最优解,甚至从来没有接近最优解(这是大多数线性程序的情况,如下所示)。另一方面,漂移加罚算法不需要严格的凸性假设。它确保了原始模型的时间平均收敛到一个最优性为 O (1/V)的解,队列大小为 O (V)界(可以证明这转化为收敛时间为 O (V2)界)。

Drift-plus-penalty algorithm for linear programming

Drift-plus-penalty algorithm for linear programming

= = 漂移加惩罚线性规划算法 =

Consider the special case of a linear program. Specifically, suppose:

Consider the special case of a linear program. Specifically, suppose:

考虑线性规划的特殊情况。具体来说,假设:

  • [math]\displaystyle{ P(x(t)) = \sum_{n=1}^Nc_nx_n(t) }[/math]

P(x(t)) = \sum_{n=1}^Nc_nx_n(t)


P(x(t)) = \sum_{n=1}^Nc_nx_n(t)

  • [math]\displaystyle{ Y_i(x(t)) = \sum_{n=1}^Na_{in} x_n(t) - b_i \text{ } \forall i \in \{1, \ldots, K\} }[/math]

Y_i(x(t)) = \sum_{n=1}^Na_{in} x_n(t) - b_i \text{ } \forall i \in \{1, \ldots, K\}


Y_i(x(t)) = \sum_{n=1}^Na_{in} x_n(t) - b_i \text{ } \forall i \in \{1, \ldots, K\}

for given real-valued constants (c1, …, cN), (ain), (b1, …, bK). Then the above algorithm reduces to the following: Every slot t and for each variable n in {1, …, N}, choose xn(t) in [xmin,n, xmax,n] to minimize the expression:

for given real-valued constants (c1, …, cN), (ain), (b1, …, bK). Then the above algorithm reduces to the following: Every slot t and for each variable n in {1, …, N}, choose xn(t) in [xmin,n, xmax,n] to minimize the expression:

对于给定的实值常数(c1,... ,cN) ,(ain) ,(b1,... ,bK)。然后,上面的算法简化为: 对于{1,... ,N }中的每个槽 t 和每个变量 n,在[ xmin,n,xmax,n ]中选择 xn (t)以使表达式最小化:

[math]\displaystyle{ \left[ Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \right] x_n(t) }[/math]

\left[ Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \right] x_n(t)


left [ Vc _ n + sum _ { i = 1} ^ K Q _ i (t) a _ { in } right ] x _ n (t)

Then update queues Qi(t) as before. This amounts to choosing each variable xi(t) according to the simple bang-bang control policy:

Then update queues Qi(t) as before. This amounts to choosing each variable xi(t) according to the simple bang-bang control policy:

然后像前面一样更新队列 Qi (t)。这相当于根据简单的 bang-bang 控制策略选择每个变量 xi (t) :

[math]\displaystyle{ \text{Choose } x_i(t) = x_{\min,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \geq 0 }[/math]

\text{Choose } x_i(t) = x_{\min,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t)a_{in} \geq 0


text { Select } x _ i (t) = x _ { min,i } text { if } Vc _ n + sum _ { i = 1} ^ KQ _ i (t) a _ { in } geq0
[math]\displaystyle{ \text{Choose } x_i(t) = x_{\max,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t) a_{in} \lt 0 }[/math]

\text{Choose } x_i(t) = x_{\max,i} \text{ if } Vc_n + \sum_{i=1}^K Q_i(t) a_{in} < 0


text { Select } x _ i (t) = x _ { max,i } text { if } Vc _ n + sum _ { i = 1} ^ KQ _ i (t) a _ { in } < 0

Since the primal variables xi(t) are always either xmin,i or xmax,i, they can never converge to the optimal solution if the optimal solution is not a vertex point of the hyper-rectangle A. However, the time-averages of these bang-bang decisions indeed converge to an O(1/V) approximation of the optimal solution. For example, suppose that xmin,1 = 0, xmax,1 = 1, and suppose that all optimal solutions to the linear program have x1 = 3/4. Then roughly 3/4 of the time the bang-bang decision for the first variable will be x1(t) = 1, and the remaining time it will be x1(t) = 0.[7]

Since the primal variables xi(t) are always either xmin,i or xmax,i, they can never converge to the optimal solution if the optimal solution is not a vertex point of the hyper-rectangle A. However, the time-averages of these bang-bang decisions indeed converge to an O(1/V) approximation of the optimal solution. For example, suppose that xmin,1 = 0, xmax,1 = 1, and suppose that all optimal solutions to the linear program have x1 = 3/4. Then roughly 3/4 of the time the bang-bang decision for the first variable will be x1(t) = 1, and the remaining time it will be x1(t) = 0.

由于原始变量 xi (t)总是 xmin,i 或 xmax,i,如果最优解不是超矩形 A 的一个顶点,它们就不能收敛到最优解。然而,这些 Bang-bang 决策的时间平均值确实收敛到最优解的 O (1/V)近似值。例如,假设 xmin,1 = 0,xmax,1 = 1,并假设线性规划的所有最优解都是 x1 = 3/4。然后大约3/4的时间第一个变量的 bang-bang 决策将是 x1(t) = 1,其余的时间将是 x1(t) = 0。

Related links

  • Backpressure routing
  • Lyapunov optimization
  • Convex optimization
  • Linear programming

= 相关链接 = =

  • 背压路由
  • Lyapunov最佳化
  • 凸优化
  • 线性规划

References

  1. 1.0 1.1 M. J. Neely, "Energy Optimal Control for Time Varying Wireless Networks," IEEE Transactions on Information Theory, vol. 52, no. 7, pp. 2915–2934, July 2006.
  2. 2.0 2.1 2.2 2.3 M. J. Neely, E. Modiano, and C. Li, "Fairness and Optimal Stochastic Control for Heterogeneous Networks," Proc. IEEE INFOCOM, March 2005.
  3. 3.0 3.1 L. Tassiulas and A. Ephremides, "Stability Properties of Constrained Queueing Systems and Scheduling Policies for Maximum Throughput in Multihop Radio Networks, IEEE Transactions on Automatic Control, vol. 37, no. 12, pp. 1936–1948, Dec. 1992.
  4. 4.0 4.1 4.2 L. Georgiadis, M. J. Neely, and L. Tassiulas, "Resource Allocation and Cross-Layer Control in Wireless Networks," Foundations and Trends in Networking, vol. 1, no. 1, pp. 1–149, 2006.
  5. 5.00 5.01 5.02 5.03 5.04 5.05 5.06 5.07 5.08 5.09 5.10 5.11 5.12 5.13 5.14 5.15 5.16 M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010.
  6. 6.0 6.1 6.2 6.3 M. J. Neely, "[Distributed and Secure Computation of Convex Programs over a Network of Connected Processors Distributed and Secure Computation of Convex Programs over a Network of Connected Processors]," DCDIS Conf, Guelph, Ontario, July 2005
  7. 7.0 7.1 S. Supittayapornpong and M. J. Neely, "Quality of Information Maximization for Wireless Networks via a Fully Separable Quadratic Policy," arXiv:1211.6162v2, Nov. 2012.
  8. L. Tassiulas and A. Ephremides, "Dynamic Server Allocation to Parallel Queues with Randomly Varying Connectivity," IEEE Transactions on Information Theory, vol. 39, no. 2, pp. 466–478, March 1993.
  9. 9.0 9.1 M. J. Neely. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with Time Varying Channels. Ph.D. Dissertation, Massachusetts Institute of Technology, LIDS. November 2003.
  10. R. Urgaonkar, B. Urgaonkar, M. J. Neely, A. Sivasubramaniam, "Optimal Power Cost Management Using Stored Energy in Data Centers," Proc. SIGMETRICS 2011.
  11. M. Baghaie, S. Moeller, B. Krishnamachari, "Energy Routing on the Future Grid: A Stochastic Network Optimization Approach," Proc. International Conf. on Power System Technology (POWERCON), Oct. 2010.
  12. M. J. Neely, A. S. Tehrani, and A. G. Dimakis, "Efficient Algorithms for Renewable Energy Allocation to Delay Tolerant Consumers," 1st IEEE International Conf. on Smart Grid Communications, 2010.
  13. M. J. Neely and L. Huang, "Dynamic Product Assembly and Inventory Control for Maximum Profit," Proc. IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010.
  14. M. J. Neely, "Queue Stability and Probability 1 Convergence via Lyapunov Optimization," Journal of Applied Mathematics, vol. 2012, doi:10.1155/2012/831909.
  15. L. Bracciale, P. Loreti "Lyapunov drift-plus-penalty optimization for queues with finite capacity" IEEE Communications Letters, doi:10.1109/LCOMM.2020.3013125.
  16. 16.0 16.1 A. Stolyar,"Maximizing Queueing Network Utility subject to Stability: Greedy Primal-Dual Algorithm," Queueing Systems, vol. 50, no. 4, pp. 401–457, 2005.
  17. 17.0 17.1 A. Stolyar, "Greedy Primal-Dual Algorithm for Dynamic Resource Allocation in Complex Networks," Queueing Systems, vol. 54, no. 3, pp. 203–220, 2006.
  18. A. Eryilmaz and R. Srikant, "Fair Resource Allocation in Wireless Networks using Queue-Length-Based Scheduling and Congestion Control," Proc. IEEE INFOCOM, March 2005.
  19. 19.0 19.1 L. Huang and M. J. Neely, "Delay Reduction via Lagrange Multipliers in Stochastic Network Optimization," IEEE Trans. on Automatic Control, vol. 56, no. 4, pp. 842–857, April 2011.
  20. S. Moeller, A. Sridharan, B. Krishnamachari, and O. Gnawali, "Routing without Routes: The Backpressure Collection Protocol," Proc. IPSN 2010.
  21. L. Huang, S. Moeller, M. J. Neely, and B. Krishnamachari, "LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff," IEEE/ACM Transactions on Networking, to appear.
  22. R. Agrawal and V. Subramanian, "Optimality of certain channel aware scheduling policies," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.
  23. H. Kushner and P. Whiting, "Asymptotic Properties of Proportional-Fair Sharing Algorithms," Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing, Monticello, IL, Oct. 2002.
  24. 24.0 24.1 C. Li and M. J. Neely, "Network utility maximization over partially observable Markovian channels," Performance Evaluation, https://dx.doi.org/10.1016/j.peva.2012.10.003.
  25. 25.0 25.1 M. J. Neely, "Dynamic Optimization and Learning for Renewal Systems," IEEE Transactions on Automatic Control, vol. 58, no. 1, pp. 32–46, Jan. 2013.
  26. D. P. Bertsekas and A. Nedic and A. E. Ozdaglar. Convex Analysis and Optimization, Boston: Athena Scientific, 2003.

Primary sources

  • M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010.
  • M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010.

= 主要来源 =

  • M.J. Neely。随机网络优化及其在通信和排队系统中的应用,Morgan & Claypool,2010。

Category:Networking algorithms Category:Queueing theory Category:Scheduling algorithms

分类: 网络算法分类: 排队论分类: 调度算法


This page was moved from wikipedia:en:Drift plus penalty. Its edit history can be viewed at 漂移加惩罚/edithistory