Improved Deep Reinforcement Learning in Virtual Network Embedding

Categories: ScienceTechnology

Summary

Virtual Network Embedding (VNE) has been widely applied in networking and IoT. We first consider:

  1. the flexibility of dynamic changes, and
  2. the computational complexity of large scale virtual network requests, then propose a Deep Reinforcement Learning method to improve traditional embedding algorithms.

By building a DDPG model for VNE problems, we predict future changes of virtual network requests and substrate network structures, and also obtain side information of large scale environments.

Since DDPG does not perfectly match the VNE problem, we propose improving the model with a prioritized experience replay concept.

Our new approach for VNE problems would have well implications in networking optimization and dynamic resource allocation.

Introduction

Virtual network embedding (VNE) in wireless networks can have a pivotal role in several areas, including: sensor network virtualization, vehicular cloud, mobile edge computing, network-based and geographically distributed cloud environment, and cyber foraging [1]. By means of virtualiza- tion, it is possible to embed, with low cost, large-scale virtual sensor networks onto sensor-equipped physical devices (e.g.

Get quality help now
Marrie pro writer
Marrie pro writer
checked Verified writer

Proficient in: Science

star star star star 5 (204)

“ She followed all my directions. It was really easy to contact her and respond very fast as well. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

smart-phones, autonomous vehicles) so as to perform specific sensing tasks and autonomous, agile, and timely decisions in a distributed manner. Such virtual networks can support several applications such as: urban sensing, intelligent transportation, terrain exploration, disaster recovery, and surveillance[2]. In addition, VNE can be used to enable virtual content delivery in wireless networks near the network edge.

The VNE problem consists of mapping the virtual nodes to substrate nodes and the virtual links to substrate paths in such a way that all resource (CPU, storage, and bandwidth) requirements of the virtual network are met[3].

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

Here, a virtual network consists of a set of virtual nodes, each requiring CPU processing capability and storage capacity to process data in a predefined geographical area, and a set of virtual links connecting these virtual nodes, each requiring some bandwidth capacity.

The effectiveness of the VNE techniques can essentially be captured through three metrics: computation time (the time it takes to solve a VNE instance), embedding cost (the amount of overhead incurred and resources needed to solve a VNE instance), and acceptance rate (the ratio of successfully solved VNE instances to the total number of instances)][4]. Therefore, in addition to meeting the resource requirements, the aim of VNE techniques is to reduce the computation time, minimize the embedding cost, and increase the acceptance rate. However, there are some challenges as follows:

  •  Real-world wireless network is always non-stationary, it’s hard to make known VNE tech- niques to automatically adjust to the dynamic network environment.
  •  The time complexity and energy cost of reapplying VNE in a large-scale complex network structure are high.
  • VNE should satisfy quality of service (QoS) requirements of non-urgent virtual networks in case of time-varying network traffic conditions as much as possible.

Reinforcement learning, as a widely-used technique in machine learning, has shown great po- tential in dealing with complex tasks, or complicated control tasks such as auto-driving and video games. The goal of a reinforcement learning system (or an agent) is to learn better policies for sequential decision-making problems with an optimal cumulative future reward signal.

There is some work about using Q-learning in VNE problem, but some existing challenges are still on the table. For example, when the set of the Reinforcement Learning model is not good, the com- putational complexity gets very high when the network structure changes. There is more side information for a sophisticated environment etc. Hence, we propose an improved DDPG model to solve the challenges.

Related Work

Haeri and Trajkovic[6] combined reinforcement learning and virtual network embedding. But different from our work, they employ the Markov decision process to solve the node mapping problem and use Monte-Carlo tree search (MCTS) as action policies. As a result, the MCTS has to be applied every time when a virtual request arrives, which requires a great amount of computing power and makes it less time-efficient. The works also employ reinforcement learning.

However, they focus on the dynamic resource management among virtual networks. Mijumbietal. applied a q-learning based reinforcement learning agent to build a decentralized resource management system, which takes the role to increase or decrease the resources allocated to a certain virtual network. Mijumbietal.[7] employed an artificial network to make resources reallocation decisions and train the network with a q-table from reference. In [8], a reinforcement learning based neuro- fuzzy algorithm is proposed.

The aforementioned works apply machine learning and reinforcement learning approaches to achieving dynamic resource management among virtual networks which are already embedded in the substrate network. Our work differs from these works in two ways. One is that we utilize a policy network based reinforcement learning method and apply policy gradient to train the policy network. Another is that we aim to improve the efficiency of the virtual network embedding process instead of dynamic resource management among virtual networks after embedding.

Detailed Research Plan

 Virtual Networking Embedding model

The VNE problem can be formulated as a substrate network, which is represented as an undirected graph GS = (NS,LS,ASN,ASL), where NS denotes the set of all the substrate nodes, LS denotes the set of all the substrate links, ASN and ASL stand for the attributes of substrate nodes and links respectively. In our work, we consider computing capability as a node attribute and bandwidth capacity as a link attribute. Fig. 1(c) shows an example of a substrate network, where a circle denotes a substrate node, and a line connecting two circles denotes a substrate link. The number in a square box denotes the CPU (computing) capacity of that node, and the number next to a substrate link denotes the bandwidth of that link. Similarly, we also use an undirected graph GV = (N V , LV , AVN , AVL ) to describe a virtual network request, where N V denotes the set of all the virtual nodes in the request, LV denotes the set of all the virtual links in the request, AVN and AVL stand for the constraints of virtual nodes and links respectively. To map a virtual node to a substrate node, the computing capacity of the substrate node must be higher than that is required by the virtual node. To map a virtual link to a set of substrate links, the bandwidth of each substrate link must be higher than that is required by the virtual link. Fig. 1(a) and (b) shows two different virtual requests.

From Fig.1 we can see when a virtual request arrives, the objective is to find a solution to allocate different kinds of resources in the substrate network to the request while satisfying the requirements of the request. If such a solution exists, then the mapping process will be exe- cuted, and the request will be accepted. Otherwise, the request will be rejected or delayed. The virtual network embedding process can be formulated as a mapping M from GV to GS : GV = (NV ,LV ) → GS(N′,P′), where N′⊂ NS,P′⊂ PS.

The main goal of virtual network embedding is to accept as many requests as possible to achieve maximum revenue for an ISP, the embedding algorithm must produce efficient mapping decisions.

within an acceptable period. As shown in Fig. 1, virtual nodes a and b in request 1 are mapped to substrate nodes E and G respectively, and virtual nodes c, d and e in request 2 are mapped to substrate nodes A, C and D respectively. Note that the embedding result of request 1 is not optimal. For example, the cost of bandwidth in the substrate network can be significantly reduced by moving a to F.

Build DDPG in VNE problem

Reinforcement Learning (RL) setup consisting of an agent interacting with an environment in dis- crete decision epochs. At each decision epoch t, the agent observes state st, takes an action at and receives a reward rt. The objective is to find a policy π(s) mapping a state to an action (deter- ministic) or a probability distribution over actions (stochastic) with the objective of maximizing the discounted cumulative reward R = 􏰂Tt=0 γtr(st, at), where r(∆) is the reward function and γ∈ [0, 1] is the discount factor.

RL has a lot of extended model, DeepMind introduced Deep RL[9], which extends the well- known Q-learning to enable end-to-end system control based on high-dimensional sensory inputs (such as raw images). The training phase adopts a DNN called Deep Q-Network (DQN) to derive the correlation between each state-action pair (st,at) of the system under control and its value function Q(st,at), which is the expected discounted cumulative reward. If the system in state st and follows action at at decision epoch t (and a certain policy π thereafter):

In deep RL, continuous learning has often been tackled by the actor-critic approach, which usually employs the policy gradient method to search for the optimal policy[10]. A recent work [6] from DeepMind introduced an actor-critic method, called Deep Deterministic Policy Gradient (DDPG), for continuous control. The basic idea is to maintain a parameterized actor function π(st|θπ) and a parameterized critic function Q(st,at|θQ). The critic function can be implemented using the above DQN, which returns Q value for a given state-action pair. The actor function can also be implemented using a DNN, which specifies the current policy by mapping a state to a specific action. According to [6], the actor network can be updated by applying the chain rule to the expected cumulative reward J with respect to the actor parameters θπ.

▽θπJ =E􏰀▽aQ(s,a|θQ)|s=st,a=π(st) ·▽θππ(s|θπ|s=st)􏰁

Fig.2 shows how DDPG works in VNE problem. The Actor-Critic learning algorithm is used to represent the policy function independently of the value function. The policy function structure is known as the actor, and the value function structure is referred to as the critic. The actor produces an action given the current state of the environment, and the critic produces a TD (Temporal-Difference) error signal given the state and resultant reward. If the critic is estimating the action-value function, it will also need the output of the actor. The output of the critic drives learning in both the actor and the critic.

In order to utilize DDPG to promote the VNE algorithm, we first define the state space, action space and reward.

  • • State Space: The state consists of three components:Ra is the percentage of resource allo- cation, Rxv is the percentage of unused virtual resources and Rxs is the percentage of unused substrate resources. Formally, the state vector is S = (Ra,Rxv,Rxs).
  • • Action Space: An action is defined as the choice to increase/decrease allocated resources to any virtual node/link respectively.
  • • Reward: The reward is the objective of the VNE problem, which is the total utility of all the resources.

The VNE problem is obviously a continuous learning problem. Even though DDPG has been demonstrated to work well on quite a few continuous learning tasks, it could still not lead to satisfying performance due to the following two reasons:

  1. The DDPG framework does not clearly specify how to explore. A simple random noise based method or the exploration methods proposed for physical control problems.
  2. DDPG utilizes a simple uniform sampling method for experience replay, which ignores the significance of transition embedding samples in the replay buffer.

To address these two issues, we are going to propose a new algorithm to optimize DDPG, particularly for VNE problem. We will use actor critic-based prioritized experience replay which can employ a new method for specifying the significance of samples with careful consideration for both the actor and critic networks. After the improvement of DDPG, we will evaluate the model on multiple network environment cases.

Contribution and Impacts

We believe our work could extract more features for each substrate node to form a more complex feature matrix, and verify the effectiveness of each feature. Meanwhile, we will make efforts to build a more complex structure for the policy network to expand its capacity of functional complexity by increasing the number of layers or neurons of each layer. By our new DDPG based model, the embedding would be automatically conducted through a training process using historical data rather than relies on any hand-crafted rules. The computational complexity could be reduced, and the resource could be better allocated due to well future prediction. The work proposed will have broad implications in networking optimization.

References

  1. A. Fischer, J. Botero, M. Beck, H. DeMeer, and X. Hesselbach, Virtual network embedding: A survey, IEEE Communications Surveys Tutorials, pp. 1-19, 2013
  2.  M. Yu, Y. Yi, J. Rexford, M. Chiang, Rethinking virtual network embedding: substrate sup- port for path splitting and migration, ACMSigcommComput. Commun. Rev., 38 (2) (2008), pp. 17-29
  3.  X. Cheng, S. Su, Z. Zhang, H. Wang, F. Yang, Y. Luo, J. Wang, Virtual network embedding through topology-aware node ranking, ACMSigcommComput. Commun. Rev., 41 (2) (2011), pp. 38-47
  4.  N.M.M.K. Chowdhury, M.R. Rahman, R. Boutaba, Virtual network embedding with coordi- nated node and link mapping, Proceedings - IEEE INFOCOM, vol. 20 (1) (2009), pp. 783-791
Updated: Feb 23, 2024
Cite this page

Improved Deep Reinforcement Learning in Virtual Network Embedding. (2024, Feb 12). Retrieved from https://studymoose.com/document/improved-deep-reinforcement-learning-in-virtual-network-embedding

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment