An Energy Efficient and Scalable WSN with Enhanced Data Aggregation Accuracy

 This paper introduces a method that combines the K-means clustering genetic algorithm (GA) and Lempel-Ziv-Welch (LZW) compression techniques to enhance the efficiency of data aggregation in wireless sensor networks (WSNs). The main goal of this research is to reduce energy consumption, improve network scalability, and enhance data aggregation accuracy. Additionally, the GA technique is employed to optimize the cluster formation process by selecting the cluster heads, while LZW compresses aggregated data to reduce transmission overhead. To further optimize network traffic, scheduling mechanisms are introduced that contribute to packets being transmitted from sensors to cluster heads. The findings of this study will contribute to advancing packet scheduling mechanisms for data aggregation in WSNs in order to reduce the number of packets from sensors to cluster heads. Simulation results confirm the system’s effectiveness compared to other compression methods and non-compression scenarios relied upon in LEACH, M-LEACH, multi-hop LEACH, and sLEACH approaches.


Introduction
Wireless sensor networks are utilized extensively in several domains, for instance in environmental monitoring, industrial automation, and healthcare.Nevertheless, the process of collecting and analyzing data in these networks presents significant difficulties due to the large volume of data produced by the sensors.In recent years, academia has researched a range of methodologies aimed at enhancing efficiency of the data collection stage [1].The limited memory capacity, energy, and computational capabilities of sensor nodes restrict their ability to handle massive amounts of generated data.Before transmission, the data should be compressed to help mitigate this disadvantage.Numerous data compression techniques have been proposed, mainly for the processing of images.The majority of them do not apply to sensor nodes due to issues with handling speed, energy consumption, and memory limitations.The researchers' solution to this problem has the form of an adaptive compressed data aggregation approach that is combined with scheduling strategies, such as LZW data compression techniques.However, the task of improving information collection in WSNs is a considerable obstacle, due to the fundamental restrictions of these networks.These restrictions include power constraints, limited transmission capacity as well as the demand for power-efficient information compression processes [2].The optimization of information-related events in WSNs holds considerable value in different applications.In ecological tracking, WSNs are used to accumulate information from varied sensing units, such as those measuring temperature, moisture or contamination levels [3].Wearable sensing devices play a crucial role in the healthcare industry, by collecting important information for monitoring the health of patients and diagnosing specific illnesses.Optimization of manufacturing processes by deploying commercial automation solutions is greatly based on the use of sensing units.Therefore, maximizing the performance of information-related events is not solely a matter of technological value, but a reasonable necessary in all those fields [4].This paper describes a technique that relies on K-means clustering, GA, and LZW compression to enhance the effectiveness of information collecting processes performed by sensing unit networks.This strategy intends to cluster sensing unit nodes, based on distance, using K-means clustering, enhance the collection processes by choosing collection heads with the use of GA, and compress the collected information with the help of LZW, to decrease transmission usage.Additionally, system layouts are presented to minimize network traffic.This paper aims also to analyze the complex nature of the hybrid approach and evaluate its effectiveness in increasing the efficiency of data collection in sensor networks.The paper then evaluates the benefits and limitations of the chosen approach, with a comparative analysis of alternative methodologies already in use, including LEACH, M-LEACH, and multi-hop LEACH.Figure 1 shows the diagram of the proposed protocol, referred to as MEECRP.K-means clustering is used to sense the nodes based on distance, thus promoting reliable information collection.This collection technique assists in arranging sensing unit nodes right into specific sets, thus maximizing information transmission and boosting network performance.Next, GA optimization is used to pick out collection heads for enhancing the collection process, directing traffic and appropriating resources.GA assists in discovering the most appropriate solutions to complicated optimization problems.Lastly, LZW compression is relied upon to minimize network traffic.This approach improves data transmission ef- fectiveness, decreases energy consumption, and boosts the performance of sensing unit networks.The performance of the proposed solution can be assessed by determining those efficiency metrics which analyze numerous elements of WSN-related procedures, such as: • reduced power consumption, i.e. increased battery lifetime, • network scalability, i.e. network size, coverage, and throughput, • information collection accuracy, i.e. information aggregation efficiency, information integrity, and data compression ratio.The paper presents a literature review in Section 2. In Section 3, the proposed model is described.In Section 4, simulation results are discussed and challenges and limitations associated with implementing the proposed method are discussed.Finally, the work is concluded in Section 5.

Literature Review
Recent studies reviewed in this section focus on developing the K-means cluster head selection algorithm, data packet compression algorithms, as well as on combining more than one algorithm to improve energy consumption in WSNs.The authors of [5] proposed the LCDGRA protocol, a clusterbased WSN cooperative data-gathering and relaying method.A hybrid K-means clustering and network coding technique reduces data propagation costs and increases network performance.Simulations show that the proposed system outperforms related systems in regard to a number of various metrics.An enhanced GA-based energy-efficient scheduling method for WSNs is presented in paper [6].This method covers target points with a minimum number of active sensor nodes, while maintaining connectivity.The fitness function is introduced, balancing between activating a minimum number of sensor nodes, ensuring comprehensive coverage, sustaining connectivity, and selecting nodes with higher residual energy levels.The proposed method outperforms former algorithms in terms of energy efficiency, connectivity, and coverage.Study [7] discusses trajectory data analysis using categorization and compression algorithms.The random forest classification algorithm is used, where DP, TR, and SP compression methods are tested.The role of selecting proper threshold values in achieving reliable results is stressed in that paper as well.The tradeoff between compression rate and precision level is shown.DP offers the highest compression ratio but provides lower accuracy levels, while TR has the best preci-sion.SP is ranked in the middle.The outcomes depend also on threshold values.
Paper [8] introduced LEACH-EN, a hierarchical clusteringbased routing protocol for WSNs which uses TDMA scheduling and data compression techniques to enhance energy efficiency and network scalability.This protocol reduces collisions, control overhead, and idle listening, thus achieving faster data transmission rates and extending network lifetime.
In [9], a data aggregation scheme known as F-LEACH for IoTenabled healthcare systems is proposed.The scheme utilizes fuzzy logic to convert qualitative data to quantitative information, as well as implements complex nonlinear functions and provides approximate solutions.Simulation results show that F-LEACH outperforms similar approaches by 5-20%.The authors of [10] discussed WSN energy consumption and network longevity.The research aimed to build an expanded K-means cluster head selection (CHS) algorithm that takes onto consideration remaining energy, node density and distance to the base station.The algorithm outperforms LEACH, Mod-LEACH, and TSILEACH in terms of packet delivery ratio, throughput, network lifetime, energy usage, and node count.
Study [11] introduced EEA-CFCHS, an energy-efficient cluster creation and cluster head selection algorithm for heterogeneous WSN.The algorithm considers the energy depletion threshold for all WSN nodes.The proposed technique reduces energy consumption and increases network stability, extending its lifespan.According to simulations, the EEA-CFCHS algorithm prolongs the stability period by 42%, increases network lifespan by 62%, and extends network lifetime by 73.16% over ESRA and P-SEP protocols.
Papers [12]- [14] proved the efficacy of data aggregation in reducing energy consumption in WSNs.This is achieved by minimizing the volume of data that needs to be sent between nodes.In such a context, [15] proposed the idea of a low-electricity adaptive clustering hierarchy (LEACH).This technique aims to achieve electricity stability in the network by implementing a mechanism in which nodes periodically expect the position of cluster heads.
Article [16] expanded the application of the LEACH protocol to WSNs that encompass a mixture of sun-powered and battery-only nodes.The solar-aware low energy adaptive clustering hierarchy (sLEACH) algorithm prioritizes the selection of solar-powered nodes for the additional transmissions needed by cluster heads.[17] introduced a data aggregation strategy that integrates clustering with directed diffusion.This approach distinguishes and handles various types of data differently during the transmission process.
Paper [18] provided an elucidation of the distinctions among three clustering techniques, including LEACH and the hybrid approach.The study examined the energy-efficient distributed clustering (HEED) method and the distributed weight-based energy efficient hierarchical clustering (DWEHC) approach.
The researchers also analyzed the impact of both methods on predictive data aggregation and data summarization processes.In [19], a gossip method is implemented for the purpose of  data aggregation, where data from certain nodes is not communicated directly, but is rather inferred.The authors of [20] presented a novel aggregation policy suitable for scenarios where data transmission is required in a flat topology without clusters.The authors also studied the effects of the indicator degree-based power policy (SLPC) in unscheduled wireless networks.
Investigations presented in [21] took into account channel benefits in addition to transmitting and getting energy degrees.The general FEP-gold standard power allocation approach, as proposed in [22], aims to evaluate relay-assisted variety communications by studying the body error chance (FEP).
In the evaluation of the aforementioned systems, the authors focused on improving power performance by using solar electricity for record aggregation and compression, for the duration of intervals of strength depletion or surplus.In [23], the researchers focused on energy intake-related difficulties and community longevity of WSNs and advocated a hybrid response incorporating particle swarm optimization (PSO) and GA methodologies.To further enhance performance, the system also included a mobile sink for aggregate data from cluster heads.
Papers [24]- [26] made significant contributions in the field of energy-harvesting WSNs by developing models and approaches to estimate the availability of environmental energy and optimize its utilization by sensor nodes.

Proposed Model
The proposed method is divided into five steps and has been assigned the name of the multi-hop energy efficient clusterbased routing protocol (MEECRP) -Fig.2.

Network Cluster Formation Using K-means Clustering
The K-means clustering algorithm is utilized to organize and balance sensor node clusters.The K-means clustering algorithm is a widely recognized unsupervised machine learning method that partitions data points into clusters based on their similarity.Within sensor networks, K-means clustering effectively organizes sensor nodes into cohesive clusters.The process of grouping facilitates effective data transmission management.Sharing data across sensor nodes within a given cluster may enhance efficiency by minimizing redundancy and optimizing energy use.Numerous research investigations have employed an optimization technique to reduce the distance between the sensor nodes and the base station.
The K-means algorithm offers a simpler and more efficient approach.Several methods can be employed to determine the ideal or effective number of clusters K.These methods include the gap statistic, average silhouette, and elbow techniques.The elbow technique is a valuable strategy in this context and is given by [27]: where SSE represents the sum of squared errors, x is the sensor found in clusters, and c k is the k-th cluster.The SSE value is used to identify the ideal K number [28].
The primary objective of the K-means algorithm is to minimize the Euclidean distance, denoted as D, between the centroid of a cluster head (CH) and its cluster members (CM).
Variable K is denoted by a positive integer representing the total number of clusters.The primary concept underlying K-means is the establishment of K centroids to represent each cluster.The points derived from a collection of n data sets can be effectively assigned to their nearest centroid.This represents the initial phase.Once all the relevant points have been considered, this stage concludes, and the process of grouping is finalized.Each K centroid must be recalculated as a cluster that emerges from the previous iteration.Subsequently, it is important to establish a new binding that encompasses the data set points and their nearest novel centroid.A loop may exist.
This implies that the position of the K centroid is iteratively adjusted until no further updates are required.The phases of K-means clustering are depicted in Fig. 3 [29].

Election of CH by Means of Probability Equation
The LEACH protocol designs a subset of nodes as CHs to control energy dissipation.CHs transmit group member data to the BS in one hop during the steady state phase.However, LEACH has its limitations, such as ignoring the residual energy in the process of CH selection, which leads to low-energy nodes and unbalanced clusters [30], [31].The proposed protocol uses hierarchical clustering settings and transfers the data from the selected CHs that are elected from Eq. ( 2) only in the first round.These parameters are Generate n random numbers between 0 and x and store them in an array X X=rand (1,n)x Generate n random numbers between 0 and y and store them in an array Y Y=rand ( considered, in round one, to be initial values for GA of subsequent rounds.

Optimum Cluster Election Using GA
GAs, which draw inspiration from the mechanisms of natural selection, provide a robust approach for optimizing routing in  sensor networks [32].The ability to adapt and modify routing techniques over time can result in enhanced data transfer efficiency.GAs can minimize energy usage and decrease the probability of network congestion by strategically picking the most optimal routing options.The optimal cluster head probability P opt was obtained in the proposed protocol using the GA approach.Figure 4 illustrates how the cluster heads are chosen by integrating GA with our methodology.The fitness function used in CH selection takes into account network longevity, energy consumption and load balancing when assessing the chromosomal arrangement.The evaluation includes the determination of cluster head quantities, placements, distances to base stations, cluster proportions, and distribution of load among clusters.Figure 5 shows the developed algorithm relied upon for electing optimum clusters by using GA.

Time and Event Scheduling Mechanism
Wireless sensor networks use time division multiple access (TDMA) scheduling in the LEACH protocol to improve energy efficiency and longevity.This method divides time into slots and allocates them to nodes for data transmission.The proposed protocol assigns a group cluster head to each group to create and manage a TDMA schedule for its members.The group head combines and sends data from the group members to the base station (BS).Before starting the clustering, each node determines the cluster head.The BS receives communications about CH status, node IDs, and locations.The BS uses a GA to find the optimal probability of CH.The BS sends an advertisement to all nodes using the optimal probability value.Each node decides its own CH role during setup and each round of the cluster.This selection uses the optimal probability of the BS and a random number.Group leaders (CHs) advise their neighbors about their situation.Non-cluster head nodes connect to the nearest cluster head node based on signal strength.Cluster heads create and distribute a schedule for transmitting data to their constituents.Each node follows the schedule of its cluster master during each data collection cycle in the steady state phase.Nodes send data to the head of their cluster during their own time period.Group leaders collect data from group members and send it to the base station.An important achievement is that only active nodes (1/3) transmit data to the  CH in the collected data step.The remaining nodes are idle, but periodically switch between active, idle and sleep roles, until the number of dead nodes in the network reaches 90% of their total number.When the number of dead nodes reaches 90%, we propose another mechanism that differs from the one mentioned above.The surviving nodes do not follow the election mechanism to join the nearest group, but are sent directly to the station.

LZW Data Compression
The Lempel-Ziv-Welch compression method decreases transmission overhead by shrinking the data size.It minimizes the amount of data sent over the network, measured with the help of compression ratio, bandwidth utilization, energy consumption, transmission time, packet loss rate, and network throughput.
Data from member nodes is aggregated at the CH.The aggregation may involve averaging, summing, or choosing representative values.Then, the aggregated data is compressed using LZW and the compressed data and the data acquired from other network nodes are subsequently placed in a queue, in preparation for transmission.At regular timing intervals, the node decides to transfer data to its queue.Following each iteration, a node determines its operational state by considering an approximation of its remaining battery charge to initiate communication with other nodes in later iterations.
The active mode is selected when there is an adequate power supply needed to sustain sensing and transmitting data to the master node in the following cycle.The sleep mode is selected when the node remains active throughout the session.During this operational phase, the node is dedicated solely to the task of sensing.
The wireless module undergoes deactivation, until the subsequent round.If a node encounters power depletion within a certain round, it becomes non-functional and forfeits the sending of any data during the future round.To address this issue, nodes engage in a process of location confirmation, before initiating the routing method, thereby eliminating dead nodes.
Figure 6 illustrates the pseudocode for aggregate compressed data in the proposed protocol.Such a hybrid strategy optimizes data transmission and storage by combining these two compression methods, lowering sink data.This reduces energy usage and extends the WSN's lifetime.Hybrid compression uses LZW and aggregation compression to reduce the quantity of data and its redundancy, especially in sensor nodes that generate bulk data packets.

Communication Modes
Multi-hop WSN communication is used to send data from sensor nodes to the BS through relay nodes.This method improves network range and enhances communication-related capabilities.Multi-hop WSNs deliver compressed packets from a sensor node to the base station via a single cluster, and then CH.
In the proposed protocol, when the number of live nodes is < 10% of the total number of nodes or <= K (number of clusters), the live nodes waste the remaining energy to re-elect CH, exchange control messages and send sensed data directly to the BS.These nodes, instead of being always active and sending data in each round, engage their sending mode when new events occur and when the new data differs from previous data.Such an approach is called event scheduling.

Performance Evaluation
Protocol simulations were conducted using Matlab.Table 1 shows the parameters of the simulation model.We deployed 100 nodes randomly, with the base station located at the corner (20,20), as shown in Fig. 7.The parameters for evaluating the offered solutions include energy consumption and the quantity of packets transmitted to the BS.The K-means algorithm is employed to create clusters after the deployment of the network, using the given parameters: an area of 100×100 m for case 1 and 300×300 m for case 2. The initial node energy level was set at 0.2 J.We simulated 4 cases with packet sizes of 951, 2780, 4525, 9159 bits, respectively, and with a total of 100 nodes.Finally, in case 3, the proposed protocol was compared with another existing routing protocol.According to the findings presented in Fig. 8, the network lifetime was achieved for K = 25.The energy usage per node demonstrates a positive correlation with the network's level of connectivity.This phenomenon is commonly observed as the importance of a node's connectedness becomes more prominent upon increasing the total number of nodes.

Simulation Results
For case 1 (100×100), Fig. 8 shows the network's lifetime after applying the proposed method to several packets of different sizes, while Fig. 9 presents the network lifetime for the proposed MEECRP approach, compared with the traditional LEACH protocol.
The impact on the lifetime of the network is positive, as the size of the data packet is changed in the LEACH protocol by applying LZW compression, but still depends on several factors, such as network configuration and node characteristics.Efficiency gains may be evaluated in terms of network lifetime and LZW data compression efficiency.Figure 10 illustrates the outcomes of several simulations, where package size bits were varied.The extent of their impact on the amount of energy consumed is shown.The proposed protocol is more efficient in terms of energy consumption, as the energy reserves stored on the network's nodes play a decisive role here.While the nodes have limited power capacity, larger packets may result in faster power depletion and shorter network lifetime.
In the second case, the area of the approach field was increased to 300×300 m, while the number of nodes remained unchanged (100 nodes).The initial energy level of each node was set at 0.2 J.The base station was placed in the corner (20,20) and the nodes were distributed randomly.The results of the simulation are shown in Fig. 11, where network lifetime for the proposed MEECRP is compared with the LEACH protocol.
Figure 12 shows the amount of energy consumed for the proposed MEECRP approach, versus the LEACH protocol.
The figure shows an improvement in the efficiency of the proposed protocol, although the size of the network and node deployment density affect the distances that the data sent needs to cover.If the network is large and densely populated, larger packets may result in longer transmission distances, thus increasing power consumption.
In the third case, we compared MEECRP with other routing protocols [36].100 nodes were spread uniformly over an area of 100×100 m, packet size was set at 2000 bits in MEECRP.
In paper [36], the packet size is 200 bits and the initial energy level for each sensor is 0.5 J.During the simulation, we modified the network's topology according to the real behavior of sensor nodes.Total network traffic, including data and control messages, may affect the network's lifetime.With bigger packet sizes, traffic load increases.This can be seen in Figs.[13][14], where the number of packets exchanged in the network was reduced to 40 in the proposed MEECRP protocol, but in the LEACH protocol, it equaled 100.
Even in the case of large data packets, the proposed MEECRP approach reduces the number of packets exchanged in the network, when compared with LEACH.Optimized network stability and lifetime are of utmost importance in maintaining high levels of performance.Network stability refers to the lead time from the initiation of the network to the first node failure (FND), whereas network lifetime denotes the interval between FND and the eventual failure of the last node (LND) within the network.
To evaluate the proposed solution and other approaches suggested for WSNs used in IoT, such as LEACH, M-LEACH, multi-hop LEACH, and sLEACH, a fair comparison is conducted using the above mentioned metrics.
Figure 15 shows the performance of the network relying on the proposed protocol, and compares it with solutions based on other routing protocols.The results indicate a significant improvement in network lifetime.

Conclusions
This research proposes a method to boost information collection capabilities of sensing networks by incorporating GA, LZW compression, and K-means approaches.The main goal of the project is to consider the impacts of those modern, cross-bred technologies on such metrics like power consumption, network scalability, and even more exact information storage.The recommended technique aims to group the sensing nodes according to distance, with the use of the K-means collection method, which certainly helps in collecting information.In addition, GA is utilized to enhance the collection procedure by choosing collection vertices.Furthermore, LZW is applied to compress the collected information in order to lower the transmission-related burden.To further reduce network traffic, scheduling mechanisms have been introduced that lead to a reduction in the number of packets sent from sensors to cluster heads.The results of this study will contribute to the development of packet scheduling mechanisms for data aggregation in WSNs.The results of this research will help develop methods for collecting and organizing data packets, which, in turn, will enhance the efficiency and use of resources in WSNs.As compared to various existing approaches, such as LEACH, M-LEACH, multi-hop LEACH, and sLEACH, the proposed MEECRP approach boosts network lifetime by 100%, according to simulations.This improvement boosts the overall performance of sensing unit networks by ensuring much more efficient use of resources, enhancing information collection processes and boosting network efficiency.
Additionally, the simulations reveal that, in contrast to noncompression methods, the recommended strategy effectively decreases power usage, raises network scalability, and increases the level of precision of information gathering processes.By using of K-means clustering, GA optimization, LZW compression, and organizing structures on several factors, which include network scalability, energy effectiveness, and statistics-amassing precision, the research sheds light on the possible benefits of incorporating those strategies in order to enhance bundle organizing in WSNs.

Fig. 5 .
Fig. 5. Pseudocode for optimum cluster election with the use of GA.

Fig. 12 .Fig. 13 .
Fig. 12. Energy consumed in the proposed MEECRP approach, compared with the LEACH protocol, for various packet sizes.
Send request for ID, position, and energy level Send request for ID, position, and energy level Wait for information from sensors nodes