It is a very challenging problem in Wireless Sensor Networks to provide privacy preservation during Data Aggregation. There are so many approaches for privacy preservation that are currently in use. Cluster Based Private Data Aggregation (CPDA) is the latest scheme that is generally followed by any wireless sensor network where privacy is needed. Cluster-based Private Data Aggregation (CPDA) uses clustering protocols and algebraic properties of polynomials. It has the advantage of less communication overhead. The goal of this paper is to and a new effective scheme to solve this problem.
In this paper, I am proposing a modified scheme (mCPDA) for privacy preserving data aggregation which is inspired by CPDA and consumes less resources than CPDA and made a comparison with CPDA in computational overhead and communication overhead.
As we know, a wireless sensor network is a network consisting of small sensors that sense the different attributes in their surroundings. Work of sensor is to collect some information from the surroundings and pass it to its neighbor and so to the destination .
As sensors are using wireless medium to send the information, the major problem is of power consumption. If the frequency of sending data is kept high, then the power consumption will be very high due to continuous sending of data. Sensors sending a bit of information continuously over sensor sending a bit of information after some manipulations over some period of time, is quite tedious and power consuming . It suggests a network processing that is used to minimize the raw data sent by the sensor nodes.
This approach is named as Data Aggregation. In many applications, we may only be concerned with some aggregate results of the readings over a period of time, that may be sum, average or max/min of data.
Now-a-days sensor network applications are increasing day by day so privacy of data sent by the sensors is one of the major problem imposed by wireless sensor network. Let’s take an example of an application that measure the household details such as how much water and electricity is used in a particular home . This information may be used for govt. agencies for resource planning of supply. But there is a counter-side of the application that it reveals the daily activities of the particular home also such as when all the family members are out of house. So, we must have to use such a scheme that collects the aggregated sensor readings while at the same time preserve data privacy.
In CPDA approach, a scheme named as Homomorphic encryption scheme is used to aggregate the data. This scheme is used for additive aggregation functions. Using this scheme, we can aggregate encrypted data without any need of decryption in the intermediate nodes. The motive of this dissertation is to minimize the complexity between collaborative data collection by wireless sensor network and data privacy. When there is no packet loss during data transmission, in both schemes CPDA and Modified CPDA (mCPDA), we get the accurate results and it guarantees that no individual sensor reading is disclosed to other sensors.
Here, I evaluate the schemes on the basis of computation and communication overheads. The modified scheme has lower computation overhead than CPDA. Furthermore, there is a comparison with CPDA in computation overhead and communication overhead.
Here, a sensor network is modelled as a connected graph G(V,E), where sensor nodes are represented as the set of vertices V and wireless links as the set of edges E. Then the number of sensor nodes is defined as|V| = N.
A data aggregation function is defined as:
y(t) = f(d_1 (t),d_2 (t),··· ,d_N (t))
where d_i (t) is the individual sensor reading at time t for node i. Generally, functions of f include sum, average, min, max and count.
In this paper, I am focusing on additive aggregation functions, that is, f(t)=?_(i=1)^N??d_i (t)?. As we know that additive aggregation is not too restrictive and moreover many other aggregation functions such as average and count can also be reduced to the additive aggregation function sum. Furthermore, some functions such as min and max, can also be approximated through additive functions. According to
max?(x_1,x_(2,)·,x_N )=lim?(k??)???(x_1^k+x_2^k+?+x_N^k)?^(1/k) ?
min?(x_1,x_(2,)·,x_N )=lim?(k?-?)???(x_1^k+x_2^k+?+x_N^k)?^(1/k) ?
we can ?nd out the max and min values by assigning k a large value.
Here I briefiy review a random key distribution mechanism proposed in . Here key distribution consists of three phases:
In the beginning, a key pool of K (Sufficiently large) keys and their corresponding identities are generated. Each sensor within the sensor network are given k keys that are randomly taken from the key pool. These k keys are called as the keyring for the particular sensor node. During the key discovery phase, every sensor node searches for a common key that it is holding with its neighbors by exchanging discovery messages. If two neighbors have the same key, then there is a secure link between two nodes. In the path-key establishment phase, a key (path key) is given to the pairs of neighboring sensor nodes that are not having common key but can be attached by two or more multi-hop secure links.
As we know that there are so many cases where we want to hide our data from individuals but required an aggregate or simply average of some analyzed data within a sensor network. This is must to build an effective scheme that can preserve the privacy and aggregate the data. Schemes must give accurate results when there is no packet loss within the network and it must take minimum time as much as possible. There is also desirable that the number of communication message transferred within the network are less. There are so many previous approaches that are currently in use. Two important schemes are Cluster based Private Data Aggregation (CPDA) and Slice-Mix-AggRegaTe (SMART). When there is no packet loss, in both CPDA and SMART, the sensor network can obtain a precise aggregation result while guaranteeing that no private sensor reading is released to other sensors. CPDA has the advantage of incurring less communication overhead while SMART has the advantage of incurring less computation overhead.
The scheme in includes selective distribution and revocation of keys to sensor nodes as well as node re-keying without substantial computation and communication capabilities. It relies on probabilistic key sharing among the nodes of a random graph and uses simple protocols for shared-key discovery and path-key establishment, and for key revocation, re-keying, and incremental addition of nodes. There is a wide and elaborative work in  for Secure multi-party computation.
It reduces the computational overhead at the cost of slightly increased communicational bandwidth consumption . Slice-Mix-Aggregate is a three-step scheme for preserving data privacy. Steps are:
As this scheme is easier to implement but SMART has the disadvantage of greater number of communication message transferred. So, there are more chances to lose of some packets during message exchange and hence less effective.
As the name suggests firstly it constructs clusters to perform intermediate aggregation. The cluster formation procedure is illustrated in gure 4. In the very beginning, a query server (let’s say Q) broadcasts a message say,hello message . When a sensor receives the hello message it elects itself a cluster leader on the basis of a predefined probability pc. If a node becomes cluster head in a cluster then it again forwards the hello message to its neighbors, the node waits for a certain time duration for receiving the hello message otherwise and joins the cluster by sending JOIN message to the cluster head. This procedure repeats again and again until all the nodes join some cluster.
The next step in CPDA is to aggregate the intermediate values among the cluster members. For simplification let a cluster contains three members: A,B and C and a,b,c are the private data held by A,B and C. Let
Figure 5 illustrates the message exchange among the three nodes to obtain the desired sum without releasing individual private data.
All cluster members share a common knowledge of non-zero numbers, referred to as seeds, x,y and z, which are distinct from each other (as shown in Figure 5(1)). Now A will calculate:
?_A^A=a+r_1^A x+r_2^A x^2
?_B^A=a+r_1^A y+r_2^A y^2
?_C^A=a+r_1^A z+r_2^A z^2
Where r_1^A & r_2^A are random numbers generated by A and known only to node A.
Similarly, node B and C calculate their values
?_A^B=b+r_1^B x+r_2^B x^2
?_B^B=b+r_1^B y+r_2^B y^2
?_C^B=b+r_1^B z+r_2^B z^2
?_A^C=c+r_1^C x+r_2^C x^2
?_B^C=c+r_1^C y+r_2^C y^2
?_C^C=c+r_1^C z+r_2^C z^2
Then node A encrypts ?_B^A and sends to B using the shared key between A and B. It also encrypts ?_C^A and sends to C using the sharing key between A and C (Figure 5(2)). Similarly, node B encrypts and sends ?_A^B to A and ?_C^B to C; node C encrypts and sends ?_A^C to A and ?_B^C to B.
Node A receives ?_A^B and ?_A^C, it has the knowledge of
?_A^A=a+r_1^A x+r_2^A x^2
?_A^B=b+r_1^B x+r_2^B x^2
?_A^C=c+r_1^C x+r_2^C x^2
Now node A calculates assembled value:
F_A = ?_A^A+ ?_A^B + ?_A^C
F_A=(a+b+c)+r_1 x+r_2 x^2
r_1=r_1^A+r_1^B+r_1^C and r_2=r_2^A+r_2^B+r_2^C
Now the cluster head A can calculate the final aggregated value (a+b+c) as x,y,z,F_A,F_B and F_C are known to A.
There is a simple and well-known technique for data aggregation, is to build a routing tree. Each cluster head sends the final sum within the cluster back to the query server through a routing tree rooted at the server or we can say converge all the intermediate sums to the server.
This approach proposed a modified scheme (mCPDA) for privacy preserving data aggregation which is inspired by CPDA. Using this scheme, we can preserve the data privacy with consuming less resources than in CPDA. As we can easily see that the most of the complexion in CPDA is due to the step Calculation Within the Cluster. If we reduce the complexity in this step then it will be a suitable approach that is secure and having less computational overhead. In a computation, number of Multiplications and Message Transferred decide the time complexity.
Consider a connected graph G(V,E), where V is the sensor nodes in the network and E is the edge between vertices. E_ij denotes the edge between node i and node j. The total number of nodes in the wireless sensor network is N. Here data aggregation function is
f(t) = f(m_1 (t),m_2 (t),··· ,m_N (t))
Where m(t)- sensor reading at time t.
Here also I am focusing upon additive properties of polynomials i.e. additive aggregation. We have already seen that other aggregation functions like average, max, min etc. can be derived from additive function.
For securing the data communication among sensor nodes, we need a key distribution mechanism. In this scheme I am going to follow the random key pre-distribution. There are three phases in this scheme:
There is a key-pool which has M keys. Every sensor node can store N keys in itself. For each sensor node N keys are randomly selected from the key pool. P_c is the probability that the two sensors have at least one same key.
Each sensor node sends out discovery messages to find out which neighbors share a common key with itself. If two neighbors share a common key, then a secure link is set up.
A path-key is assigned to pairs of sensors who do not have keys but can be connected by multi-hop secure links.
Let a seed k in each sensor node, which is different from each other. It must be given to each node before using it. We are using range of k is 1 to M, where M is the number of sensors in the network.
As in CPDA the very first step here is cluster construction. For this cluster formation we use the previous first approach. Where a query server broadcast hello message and all others on receiving message selects them self as the cluster head with the predefined probability or wait for the message to come over it. This process of cluster creation resists until all the nodes are present in some cluster.
Suppose a cluster contains n nodes, now select three sensor nodes A,B and C within the network. Then
These three nodes within the same cluster broadcast their seeds k_1,k_2 and k_3. So A,B and C share the common knowledge of k_1,k_2 and k_3. Then all nodes in the cluster randomly generate a positive number within 100.
Now node A will calculate:
V_A2=? k?_2 x + a_1;
V_A3=? k?_3 x + a_1;
Where x-private data kept in A, a_1-random number generated by A.
Similarly, node B calculate:
V_B2=? k?_2 y + a_2;
V_B3=? k?_3 y + a_2;
Where y-private data kept in B,a_2-random number generated by B.
Similarly, node C calculate:
V_C2=? k?_2 z + a_3;
V_C3=? k?_3 z + a_3;
Where z-private data kept in C, a_3-random number generated by C.
Now node A encrypts V_A2 and V_A3, then send them to node B and C respectively. Node B encrypts? V?_B3 and send it to node C. Similarly, node C encrypts V_C2 and send it to node B.
Now node B can calculate:
F_B = V_A2 +? V?_B2 + V_C2 = k_2 (x + y + z) +? a?_1 + ? a?_2+ ? a?_3;
F_C = V_A3 +? V?_B3 + V_C3 = k_3 (x + y + z) +? a?_1 + ? a?_2+ ? a?_3;
· Now, node B send F_B to node C. Since C has the knowledge of F_B, F_C, k_2 and k_3, so it can ?nd the sum of x,y and z without the knowledge of x and y.
Figure 6 shows the process of message exchange during the scheme.
Now we will have the aggregated result of three nodes in the cluster, now we will select two more nodes within the cluster with this aggregated result and again follow the same procedure. So, on and so forth, until all the data are gathered by the cluster head. By doing this we will have (n-1)/2 steps, where n is the number of nodes within the cluster.
As we know that in any mathematical computation, multiplication and communication take most of the time. So, we can judge the computational complexity by knowing the number of multiplications take place in each of the schemes. I am going to judge CPDA with my scheme mCPDA with the number of multiplications performed in each:
Let us suppose there are one Cluster head and n other nodes (members of cluster) in a cluster. Say S_0,? S?_1,? S?_2,? S?_3,…,? S?_n are the sensor nodes within the cluster, where ? S?_0 is Clusterhead. P_i are the Private data values for sensor node? S?_i and ? A?_i are Common shared values within the cluster (Non-zero number generated by each sensor node). Each ? S?_i generates n private numbers R_1^i,R_2^i,R_3^i,·, R_n^i.
Now each sensor calculates:
V_j^i=P_i+R_1^i A_j+R_2^i ??(A?_j)?^2+?+ R_n^i ??(A?_j)?^n ; where 0?j?n
Then ? S?_i will send V_j^ito ? S?_j.
After exchanging data, every sensor ? S?_j calculates a value? F?_j as follows:
F_j=?_(i=0)^n?V_j^i =P+R_1 A_j+R_2 ??(A?_j)?^2+?+R_n ??(A?_j)?^n
R_k=?_(i=0)^n??R_(k )^i for 1?k?n?
Then F_j broadcast S_j to the cluster head S_0 which calculates the sum P according to U=G^(-1) F, where G^(-1) is the inverse of the matrix:
Here F=??(F?_0,F_1,? F?_2,·,F_n)?^T and U=?(P,R_1,? R?_2,·,R_n)?^T
Now we can see how many multiplications are involved in the calculation as:
Total number of Exponentials =(n – 1)* n; this means
multiplications involved= [(n-1)*n]*n
Total number of Multiplications= n*n;
Total number of Additions?^*= n * n + n * n;
(* We can ignore this because Addition takes quite less time compared to multiplication)
And one n*n Matrix inverse; here multiplications involved=n(n-1)(4n+1)/6;
Division takes similar time as multiplication so we can take it in multiplication count. Hence total number of multiplications involved can be found by adding all multiplications.
Total number of multiplications involved in CPDA= (5n^3 + 3n^2 + 4n)/3
So we can conclude that CPDA takes number of multiplications in order of n^3 where n is the total number of nodes in the cluster.