Performance of TANN, NARX, and GMDHT Models for Urban Water Demand Forecasting: A Case Study in a Residential Complex in Qom, Iran

Mostafa Rezaali; Reza Fouladi-Fard; Abdolreza Karimi

doi:10.34172/ajehe.5283

Article History

Submitted: 31 May 2022

Accepted: 07 Nov 2023

First published online: 29 Dec 2023

(Enw Format - Win & Mac)

BibTeX

(Bib Format - Win & Mac)

Bookends

(Ris Format - Mac only)

EasyBib

(Ris Format - Win & Mac)

Medlars

(Txt Format - Win & Mac)

Mendeley Web

Mendeley

(Ris Format - Win & Mac)

Papers

(Ris Format - Win & Mac)

ProCite

(Ris Format - Win & Mac)

Reference Manager

(Ris Format - Win only)

Refworks

(Refworks Format - Win & Mac)

Zotero

(Ris Format - FireFox Plugin)

Cited By

Google Scholar

Cited by CrossRef ()
Cited by Scopus (1)

Article Access Statistics

Abstract View: 989
PDF Download: 518
Full Text View: 73

Avicenna J Environ Health Eng. 10(2):85-97. doi: 10.34172/ajehe.5283

Original Article

Performance of TANN, NARX, and GMDHT Models for Urban Water Demand Forecasting: A Case Study in a Residential Complex in Qom, Iran

Mostafa Rezaali ¹, Reza Fouladi-Fard ^2,^3,^*, Abdolreza Karimi ⁴

Author information:

¹Department of Geography, University of Florida, Gainesville, Florida, United States

²Research Center for Environmental Pollutants, Department of Environmental Health Engineering, Qom University of Medical Sciences, Qom, Iran

³Environmental Health Research Center, School of Health and Nutrition, Lorestan University of Medical Sciences, Khorramabad, Iran

⁴Department of Civil Engineering, Qom University of Technology, Qom, Iran

*Corresponding author: Reza Fouladi-Fard, Email: rezafd@yahoo.com

Abstract

To keep the balance between demand and supply, methods based on the average per capita consumption were usually applied to predict water demand. More complicated models such as linear regression and time series models were developed for this purpose. However, after the introduction of artificial neural networks (ANNs), different applications of this method were used in the field of water supply management, especially for urban water demand prediction. In this study, multiple types of ANNs were studied to understand their suitability for a residential complex water demand prediction in the city of Qom, Iran. The results indicated that time series ANN (TANN), nonlinear autoregressive network with exogenous inputs (NARX), group method of data handling time series (GMDHT), and their wavelet counterparts (i.e., w-TANN and w-NARX) exhibited varying degrees of performance. Among the aforementioned models, w-NARX performed the best (based on the average overall error) with the test set root mean squared error (MSE) of 49.5 (m³/h) and R of 0.93, followed by the GMDHT model with the test set MSE of 104 (m³/h) and R of 0.97 and w-TANN with the test set MSE of 68.8 (m³/h) and R of 0.91. In addition, the feedback connection in NARX compared to TANN demonstrated overall performance improvement.

Keywords: Recurrent artificial neural networks, Group method of data handling, Time series modeling, Urban water demand forecasting, Qom,

Copyright and License Information

© 2023 The Author(s); Published by Hamadan University of Medical Sciences.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Please cite this article as follows: Rezaali M, Fouladi-Fard R, Karimi A. Performance of TANN, NARX, and GMDHT models for urban water demand forecasting: a case study in a residential complex in Qom, Iran. Avicenna J Environ Health Eng. 2023; 10(2):85-97. doi: 10.34172/ ajehe.5283

1. Introduction

Many decades ago, satisfying consumer water demand with sufficient quality was one of the major concerns for water supply companies and utilities. According to the literature, predicting water demand can help to provide users with quality water in adequate volumes at reasonable pressure (1). The challenge of water demand prediction is of particular interest in arid/semi-arid cities, because of water shortages which usually occur in dry seasons (2). In the case of supervised learning, one of the most widely applied techniques used for training artificial neural network (ANN) is a back-propagation algorithm with feedforward networks. Maier and Dandy (3) argued that the geometry of ANNs is rarely addressed in the papers and network parameters like learning rate, transfer function, and error function were disregarded or rarely considered. It is also noteworthy to mention that the words “learning” and “training” in ANNs are equivalent to the parameter estimation phase in statistical models. Water demand prediction can have many benefits such as pressure management in the water distribution system and avoiding water leakages (1). Ghiassi et al(4) utilized the DAN2 model to predict short-term, medium-term, and long-term urban water demand and compared the performance of DAN2 with that of the ANN and autoregressive integrated moving average (ARIMA). The results indicated that the DAN2 model had better performance than ARIMA and ANN. Adamowski and Karapataki (5) compared multiple linear regression (MLR) and ANN for peak urban water demand forecasting in Nicosia. Different learning algorithms were evaluated for training the ANN model. The results suggested that the Levenberg–Marquardt algorithm provided a more accurate prediction than MLR and the other types of ANNs.

Over time, different architectures of ANN were developed. For example, feedforward networks have drawn lots of attention since the early stages of development. In this study, a nonlinear autoregressive network with exogenous inputs (NARX), NARX coupled with wavelet transform (w-NARX), time-delayed ANN (TANN) and its wavelet version (w-TANN), and group method of data handling time series (GMDHT) were trained to predict the daily water demand in Mahdie Residential Complex (MRC) in Qom, Iran.

2. Methods

2.1. Study Area

Qom (capital city of Qom province), one of the most populous cities in Iran, is located in the center of Iran with a semi-arid climate (6-9). Historically, the water supply of Qom has been of great importance due to the arid environment of this city and water scarcity. The first water distribution system of Qom was designed and constructed in 1964 by connecting four wells using 83 km cast iron pipes along the Qomrood River and pumping the water into the two elevated concrete tanks with a capacity of 2000 m³. After a while, with a growing population, the number of wells increased to 34. In 1994, 15 Khordad Dam was built to ensure the supply of standard drinkable water to the consumers. On average, in the summer season, the city’s water demand is estimated to be 4600 m³/s, but in the winter, this number decreases to 2900 m³/s (10). Therefore, the water demand for the whole city can potentially be a function of climatic parameters. Fig. 1 presents an aerial view of the study area.

Fig. 1.

The Study Area

The MRC is chosen as a case study due to its recent construction, low leakage rates, and its distinction as a separate pressure zone within the city. The water distribution network characteristics of the MRC in Iran are distinct due to the negligible impact of hydraulic difficulties on the urban water demand (UWD). Consequently, the flow rate data accurately represents the unadulterated UWD. According to the Ministry of Energy in Iran (2), the estimated daily per-capita water consumption of the MRC during the study period is around 288 L. This amount is approximately 68 L more than the national average and 12 L less than the water consumption in Tehran, the capital of Iran. The complex encompasses an expansive green space spanning over 2 km² and accommodates over 3000 inhabitants, a significant portion of whom are those engaged in the profession of preaching Islam and cultural advocacy.

2.2. Technical Properties of the Distribution Network

The main water supply of the city is provided by Koucherey and 15 Khordad dams with a capacity of 207 and 165 MCM, respectively. Other water supply sources are groundwater, namely Aliabad wells and Qomrood wells. The water of MRC is supplied by a tank which is usually refilled every 24 hours.

2.3. Experimental Setup

The water flow data was received from the Water and Wastewater Company. The dataset is recorded every hour from May 13, 2016, to February 19, 2017, which is equal to a 6768-hour observation. Since there were only a few records of precipitation, rainfall occurrence, and amount were not used as explanatory inputs of the model.

The extraction of useful information from a large dataset has been thoroughly investigated in the literature (11). Input variable selection has been a significant challenge in any research related to the topic of prediction (12). To meet this challenge, the iterative input selection (IIS) algorithm was used to select the most relevant inputs describing the water demand dataset (13). This algorithm has proved to be capable of choosing the most appropriate and non-redundant input in several testing conditions (13).

2.4. Predictive Models

2.4.1. MLP Neural Network

One of the most widely applied architectures of ANNs is the multi-layer perceptron feedforward neural network (MFNN). This popularity may stem from the ability of MFNN to approximate any complex nonlinear system dynamics. This method is a supervised learning method that estimates and adjusts weights and biases to approximate the output from the given inputs. In each epoch, it reduces the error between the estimated and real output by minimizing the cost function. According to the literature, a three-layer MFNN (input layer, one hidden layer, and output layer) can approximate almost all types of system behavior (3,14).

Equation 1 and Equation 2 illustrate the mathematical basis of a three-layer MFNN (15).

Eq. (1)

y_{i} (t) = f (s_{j}^{h} (t)) = f (\sum_{i = 0}^{n_{I}} W_{i j}^{h} x_{i} (t)) f o r j = 1, .... n^{H}

Eq. (2)

Z_{k} (t) = f (s_{k}^{O} (t)) = f (\sum_{i = 0}^{n_{H}} W_{k j}^{O} y_{i} (t)) f o r k = 1, ... n_{o}

where n_H is hidden layer node, n_o is the output layer node, n₁is the input layer node, x_i(t) is the input to node i of the input layer, y_i(t) is the quantity computed by node j of the hidden layer, and Z_k(t) is the output computed by node k of the output layer. W_ij^h controls the strength of the connection between input node i and hidden node j controls the strength of the connection between hidden node j and output node k. Input and hidden layer bias are considered x₀ = 1 and y₀ = 1 to permit adjustment of the mean level at each stage.

24.1.1. TANN

The time series version of ANN (i.e., TANN) was used in this study. Hansen and Nelson (16) suggested that due to the ability of ANNs to deal with non-linear cyclic patterns, they are good alternatives to handle time-series variations (3). Considering a stationary neural network, a function to describe the relationship between inputs and outputs can be written as Equation 3:

Eq. (3)

y = f (x_{1}, x_{2}, x_{3} ..., x_{n})

where x_n is an independent variable, and y is a dependent variable. It is considered that this type of ANN is functionally equivalent to non-linear regression models. However, to make an ANN extrapolate beyond the range of data, another function is needed to forecast future data based on past observation as an input. This is mathematically achieved by Equation 4:

Eq. (4)

y_{t} + 1 = f (y_{t}, y_{t - 1}, ..., y_{t - n})

where yt is the observation at time t. The t in the training process is called lag or tapped delays.

2.4.2. NARX

Historically, moving average (MA), autoregressive moving average (ARMA), and linear parametric autoregressive (AR) have been widely used among researchers and scientists. However, due to the linear nature of ARMA, it may not perfectly fit nonlinear time series systems (17,18). With the emergence of feedforward neural networks, it was found that by connecting each neuron to the next layer, the previous layer, the same layer, and even themselves (recurrent neural networks), they can implicitly model dynamical system properties (3). NARX neural networks are proven to learn the behavior of the system more efficiently. Compared to other network geometries, NARX neural networks generally converge faster and generalize better (19). It is a type of recurrent neural network that can capture both linear and nonlinear relationships between input and output data over time. NARX models are particularly useful when dealing with dynamic systems where the current output depends not only on past inputs and outputs but also on exogenous (external) inputs (20). The scheme of the NARX neural network with a feedback connection is shown in Fig. 2.

Fig. 2.

A NARX Network With a Feedback Connection

2.4.3. Wavelet Analysis

Wavelet analysis (WA) is a mathematical tool to extract specific information from data. These data are usually hard to interpret because of their chaotic structure. Examples of these data are noisy images, high-frequency signals, and perturbed time series data like Mackey-Glass chaotic time series, and so on.

WA has been applied to different types of applications, including but certainly not limited to signal de-noising, image processing, and time series decomposition with flying colors (21). The coupled form of wavelet analysis and neural networks has been widely used in different applications. However, many of these studies used wavelet transform (WT) functions incorrectly in terms of using future data when decomposing the time series data, selecting the level of decomposition and wavelet filters inappropriately, and data partitioning (22,23). To summarize, it is essential to carefully select the type of WT, wavelet filter and scaling, level of decomposition, and data partitioning.

2.4.4. GMDHT

GMDH algorithm is based on a category of the heuristic self-organizing method used in GMDH neural networks. Introduced by Ivakhnenko (24), GMDH is a technique for constructing an extremely high-order regression-type polynomial. This approach has the ability to establish a higher-level correlation for a user, based on the inputs and outputs of a system that is being analyzed by the user (25). In this study, the GMDH algorithm was used in MATLAB exactly based on the parameters which are defined by Ivakhnenko (26). Fig. 3 illustrates the general process of elimination of functions which describes the system dynamics less efficiently.

Fig. 3.

The General Process of the GMDH Neural Network

GMDH or multiple nonlinear regression can be used in different types of categories such as identification of physical laws, an approximation of multidimensional problems, pattern recognition, and so on (27).

GMDH neural network fits equation 5 (i.e., nonlinear polynomial regression of inputs as X and outputs as Y) where ɑ is the intercept, β is the coefficient, and k is the number of inputs or observations.

Eq. (5)

Y = ɑ + β_{1} X_{i} + β_{2} X_{j} + β_{3} X_{i}^{2} + β_{4} X_{j}^{2} + ... + β_{k} X i X j

Although GMDH and multiple nonlinear regression are similar in terms of using nonlinear regression (MNLR), GMDH is different from MNLR. GMDH uses equation 5 to fit a regression to the target dataset. However, MNLR can use different types of nonlinear regressions. GMDH is a self-organizing method that applies the basic idea of a natural selection algorithm, while MNLR does not. It is also known as a polynomial neural network (28).

2.5. Model Development

Although there is no consensus about the optimal network architecture of ANNs, the importance of ANNs architecture in the performance of the model has never been questioned. Defining an optimal network architecture is one of the most challenging tasks in the modeling process. A reason for this might be that the performance of the neural networks is highly problem-dependent (3); therefore, it is hard to define architecture as a global solution. However, there are some general guidelines to achieve better performance using a specific ANN-based model. In the following subsections, the detailed description and structure of all applied models are discussed.

2.5.1. Model Geometry

2.5.1.1. Model Inputs, Data Division, and Choice of Lags

The input data, including maximum daily temperature, wind speed, and cloud cover were considered as the inputs of the ANN models. Having this in mind, before using these parameters as inputs, they all underwent one-sample Kolmogorov-Smirnov test to study the distribution of data. The test results for all of the parameters indicated that the test rejects the null hypothesis at the 5% significance level. This means the distribution of data does not follow the Gaussian distribution. Thus, the Spearman correlation analysis was performed to investigate the relationship of these parameters with water demand data.

Data division was performed by dividing data into three subsets: 70% for training, 15% for validation, and 15% for testing. Due to temporal dependencies of time series prediction, the data set was not shuffled before the training process (1,29); therefore, the sequence of data remained the same as the inputs.

Before the data was used as input for the training process, all data were normalized between specific ranges based on the domain of the transfer function used. It is noteworthy to mention that all the models used in this paper were scripted in MATLAB (version 2017a). Equation 6 provides mathematical formulae of the normalization used in this study:

Eq. (6)

y = \frac{(y_{\min} - y_{\max}) \times (x - x_{\min})}{(x_{\max} - x_{\min})} + y_{\min}

where y_max is the maximum value for each row of y, y_min is the minimum value for each row of y, x is an N-by-Q matrix, x_min is the minimum value for each row of x, and x_max is the maximum value for each row of x.

2.5.1.2. Number of Hidden Layers and Nodes

Since the number of hidden layers and neurons is highly problem-dependent, there is no global number of layers of neurons to model the system dynamics. A high number of neurons can cause overtraining and produce unreliable performance results due to the inability of the model to generalize.

In this study, the number of layers and neurons was defined by a trial and error process. It was found that a network with one hidden layer and one neuron can approximate the input data efficiently.

2.5.1.3. Choice of Transfer (Activation) Function

Generally, the selection of transfer function is, to some extent, dependent on how much data are noisy and are highly non-linear (3). Since the weight initialization process in ANNs-based models is random, selecting the best model usually requires simulating the model multiple times and saving the model and its results for each simulation. To this end, the models were scripted in a way that in each simulation, the train, validation and test performance, outputs, and targets, and the NARX, wavelet-NARX (w-NARX), TANN, and wavelet-TANN (w-TANN) models are saved on the hard disk in a text file in space-delimited format. The number of the simulation was chosen by considering computational limitations. After 30 simulations, the best model was selected. This process was divided into four main combined network configurations with a hyperbolic tangent sigmoid and symmetric saturating linear transfer function, log-sigmoid and linear transfer function, hyperbolic tangent sigmoid and linear transfer function, and sigmoidal and symmetric saturating linear transfer function in the hidden layer and output layer, respectively.

In this study, to conform to the demand of the transfer function (30) and to avoid saturation of the transfer function (31), the model with tansig transfer function was evaluated by the scaled range of [-0.9 0.9], and the model with logsig transfer function was evaluated by the scaled range of [0.1 0.9]. Fig. 4 shows the transfer functions applied in this study to evaluate network performance.

Fig. 4.

Transfer Functions Used to Evaluate Overall Network Performance

2.5.1.4. Choice of Optimization Method

In both of TANN and NARX models, the Levenberg–Marquardt algorithm (LMA) was selected to train the model. This algorithm is based on the classic Newton algorithm with some modifications. The changes in the behavior of this algorithm at a different distance of local minima error enable this algorithm to converge faster to global minima error and escape from local minima error. Using LMA has several advantages over other algorithms such as gradient descent with momentum (GD) and conjugate gradient (CG) (32-35).

2.5.2. Wavelet NARX and ANN

The framework used in this study is based on the wavelet data-driven forecasting framework (WDDFF) method introduced by Quilty and Adamowski (22). To this end, the following subsections define the procedure used to apply WDDFF.

2.5.2.1. Selection of WT

There are two types of widely used WT for hydrological prediction, which do not consider future data for decomposition, including maximal overlap discrete wavelet transform and à trous (AT). However, since AT can be used for preprocessing both target and input data, in this study the authors proposed this WT (22).

2.5.2.2. Selection of Decomposition Level and Wavelet Filters

Considering the number of data available, the maximum decomposition level was used based on boundary-affected coefficients (BAC) calculated by Equation 7:

Eq. (7)

B A C = (((2^{L} - 1)) \times (S - 1)) + 1

where L is the maximum level of decomposition and S is the number of scaling coefficients. As the water demand data was available for 283 days, and BAC data were not to be used in the training process, selecting the correct number of L and S was very important. It was found that a Daubechies (db) filter of scale 3 and level 3 will provide sufficient data and efficiency (283–36 = 247) to train the network. The time series of the demanded data were decomposed by two filters: a low-pass filter to get approximation decomposition and a high-pass filter to get details of the signal decomposition. The data flow through the W-NARX and W-TANN is shown in Fig. 5.

Fig. 5.

Data Flow in the W-NARX and W-TANN

Before using the data, 36 records of each lagged dataset (1 to 14) were eliminated to remove BAC. After normalization between -1 to 1, data is fed into NARX as an input.

2.5.3. GMDH

The input parameters for training the model included demand data, temperature, cloud cover, and wind speed in the previous two weeks. To apply GMDH model regression to the data, the Time Series Prediction tool using GMDH in MATLAB was used (36). The number of neurons was optimized in an additive stepwise mode by adding a neuron to find which model could provide the best performance. Similarly, the method of choosing the best delay and threshold value was done by a trial and error process.

2.5.3.1. Data Division

The data were divided into two groups, including training and testing groups. The least-squares algorithm was used to train 70% of the input data, and the rest of the data set (30%) was used for testing.

2.5.4. Performance Evaluation Methods

All of the models underwent an identical performance evaluation process. Equations 8 and 9 were chosen to evaluate each model performance.

Eq. (8)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(O_{i} - P_{i})}^{2}}

Eq. (9)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(O_{i} - P_{i})}^{2}

where, n is the number of test cases, O_i is the true value of the target variable for the case i, and P_i is the respective prediction of the model for the same case.

3. Results and Discussion

According to the IIS input selection, maximum daily temperature, wind speed, mean daily cloud cover, holidays, and rainfall amount were selected as input variables by IIS. Conversely, cloud cover was excluded as a potential input for the proposed models (Fig. 6)

Fig. 6.

IIS Algorithm Output

The technique of cross-correlation was employed to ascertain the temporal associations between the input variables and the data on water demand. This approach has been extensively employed in the determination of input delays (37-43). Fig. 7 shows the cross-correlation plot of the input variables and water demand data. Accordingly, rainfall amount at a time (t), wind speed at time (t), temperature at time (t-3), and holidays at time (t-14) had the highest correlation with the water demand observations. Similarly, Rezaali et al (2) found a significant correlation between temperature, holidays, and wind speed. Prasad et al (44) also implemented IIS for streamflow forecasting and found similar results. Both of these commonalities demonstrate the association between water flow rate and meteorological and calendar inputs.

Fig. 7.

Cross-correlation Plot of the Input Variables

3.1. Model Performance

As Table 1 and Table 2 suggest, the performance of TANN models is better than NARX models in the training phase. However, in the validation and test phases, NARX performance, on average, was better than TANN.

Table 1. Performance of the Best TANN Models with a Different Combination of the Transfer Function in the Hidden and Output Layers

Hidden Layer	Output Layer	Train (MSE)	Validation (MSE)	Test (MSE)	R2 Across All Data
Logsig	Purelin	5.35E + 04	1.34E + 05	2.05E + 05	7.38E-01
Tansig	Satlins	8.54E + 04	9.46E + 04	1.46E + 05	7.21E-01
Logsig	Satlins	1.49E + 04^a	2.91E + 05	3.14E + 05	7.11E-01
Tansig	Purelin	5.47E + 04	1.75E + 05	1.94E + 05	7.23E-01