Document Type : Original
Authors
1 Department of physical Education and Sports Sciences, Isfahan (Khorasgan) Branch, Islamic Azad University Isfahan, Iran
2 Department of Physical Education and Sports Sciences, Esfahan (Khorasgan) Branch, Islamic Azad University, Esfahan, Iran
Abstract
Keywords
Main Subjects
In today's competitive world, customers are one of the most important assets for any organization and play a crucial role in enhancing market competitiveness and organizational performance (Bi, 2019). Amidst intense market competition, customers can easily make specific choices among multiple products or service providers. Studies show that the cost of acquiring a new customer is often higher than the cost of retaining an existing one. If a company maintains a good relationship with its customers over time, it will gain more profit from its current clientele. Therefore, to maintain market advantages, determining how to utilize existing customer resources and prevent the loss of current customers has become both important and necessary for companies (Xiahou & Harada, 2022).
Customers, in addition to their significance in physical markets, are also vital in electronic commercial markets. In other words, customers in e-commerce are the driving force behind the dynamics of online trade. Despite the considerable increase in online businesses compared to previous years, some users still lack the necessary trust in online shopping and prefer to purchase their required services through traditional means. Consequently, attracting positive customer feedback and motivating managers will lead to indirect marketing of their products. Thus, retaining and strengthening customer loyalty is a strategic challenge for organizations aiming to maintain and develop their competitive position in the market. Companies that focus not only on short-term sales but also on long-term customer satisfaction by providing valuable and differentiated products and services will naturally achieve greater market penetration and cultivate more loyal customers compared to their competitors (Mohammadi Javadi & Noorbakhsh, 2018). In this regard, there are two fundamental approaches that companies can use to attract and retain customers. The first approach is the "unfocused" approach. In this method, a company seeks to improve the quality of its product and relies on mass advertising to reduce customer churn. The second approach is the "focused" approach, in which the company targets its marketing campaigns toward customers who are more likely to churn. This approach can be further divided based on how customers are targeted. Marketing managers can establish favorable and long-term relationships with customers by identifying and predicting changes in customer behavior. Understanding these changes can help managers create effective advertising campaigns (Haghighatnia et al., 2018). Given the increasing competition in various industries, predicting customer behavior and understanding the factors influencing their attraction and retention is crucial for the success of businesses. This is also true in the sports and recreation industry, where sports complexes face the challenge of attracting and retaining customers. Therefore, companies need to evaluate the value of their customers, segment them based on their values, and develop strategies for each segment to acquire and retain profitable customers (Cheng & Lin, 2015).
In today's competitive and challenging markets, identifying potential customer churn and providing early warning indicators of problems that could lead to customer loss is essential (Cheng et al., 2019). By analyzing customer behavior, it has been established that an expert and relatively effective system for early detection of customer churn can assist businesses in addressing issues before they escalate. In the realm of predicting customer churn, Raeisi and Sajedi (2020) state that customer churn is a crucial criterion for evaluating a growing business, and companies need to predict it effectively to retain their customers. They also assert that the Gradient Boosted Trees method, with an accuracy of 90.86%, is highly accurate compared to other methods and can significantly benefit these types of businesses. Furthermore, some researchers believe that a customer's tendency to churn and switch in e-commerce depends on the following factors: the value paid for the first order, the number of items purchased, shipping costs, and categorization of purchased products, customer demographics, and customer location. However, the tendency of customers to switch is not influenced by factors such as population density in the customer's area and the division into rural and urban areas, as well as the quantitative analysis of the first purchase (Matuszelański & Kopczewska, 2022). Amajuoyi et al. (2024) in a study, pointed to the wider application of predictive analysis to increase customer loyalty and expand sports businesses. Suhanda et al. (2022) in their research on customer retention with increasing competition, they emphasized the direct relationship between customer satisfaction and retention rate. If a customer is not satisfied, the retention rate will automatically decrease. If a company fails to meet customer expectations, it will have a serious impact on the company, namely the transfer of customers to other services; such as service, price, value for money, satisfaction, and trust affect customer retention. The algorithm proposed by De Caigny et al. (2018) introduces a new combined classification method for predicting customer churn based on logistic regression and decision trees. In this study, a new combined algorithm called the LogitLeaf model was suggested for improved data classification. The core idea behind the LogitLeaf model is that building different models on segments of the data, rather than on the entire dataset, results in better predictive performance while maintaining the interpretability of the models constructed on the leaves. The LogitLeaf model consists of two stages: the partitioning stage and the prediction stage. In the first stage, customer segments are identified using decision rules, and in the second stage, a model is created for each leaf of the tree. This new combined approach is analyzed alongside decision trees, logistic regression, random forests, and logistic model trees in terms of predictive performance and interpretability. The area under the receiver operating characteristic curve and lift charts are utilized to measure predictive performance, where the LogitLeaf model significantly outperforms logistic regression and decision trees, while performing at least as well as the other models.
In the study by Bamdad and Hatami (2017), it is stated that various types of neural networks can be employed for predicting customer churn, and data mining techniques are effective for this purpose. Additionally, the multilayer perceptron neural network model demonstrates higher accuracy compared to composite models. Moradzadeh and Khodayari (2017) also utilized decision tree and data mining methods to investigate customer churn in electronic banking services. They employed three algorithms—C&R TREE, QUEST TREE, and CHILD TREE—to predict customer churn, identify patterns leading to churn, and determine the most significant influencing factors. The researchers found that the C&R TREE algorithm outperformed the other algorithms in predicting customer churn. Furthermore, based on the decision trees formed and the percentage of churn at each node, rules leading to customer churn can be uncovered. According to the research findings, five crucial predictors of customer churn are occupation, branch level, education, current balance, and investment type. Banks should pay particular attention to these factors. The research results of Ali Beignejad and Haj Mohammadi (2019) on predicting customer churn in social networks using data mining analysis (including decision trees, neural networks, and regression) indicated that the proposed approach by the researchers demonstrated high usability, and their combined method achieved greater accuracy compared to other techniques. Regarding the predictive model of customer churn in online businesses, the research by Rostami Fard and Sajoudi Shijani (2021) utilized an aggregative classifier based on neural networks, with neural networks as the base learner. They incorporated classification algorithms such as decision trees, neural networks, naive Bayes, and support vector machines. Their proposed method showed superior performance, achieving an accuracy of 97.86% in predicting customer churn compared to other techniques. Aldosary & Alrashdan (2021) conducted a study on predicting gym membership churn using artificial neural networks, focusing on the concept of psychological habit formation in the fitness industry. The researchers aimed to develop a model that predicts gym membership churn using multilayer perceptron artificial neural networks with back propagation, emphasizing feature selection. Their results demonstrated that integrating the concept of psychological habit formation significantly enhanced the effectiveness of the neural network model in fitness maintenance strategies, achieving high predictive performance with accuracy, sensitivity, and specificity of 92.1%, 89.1%, and 93.8%, respectively. Xiahou et al. (2022) studied predicting customer churn in e-commerce using K-means clustering and support vector machine techniques. They divided customers into three groups and identified the main customer segments. The research findings revealed that each predictive indicator significantly improved after customer segmentation, highlighting the importance of using the K-means clustering algorithm. Furthermore, the results indicated that the accuracy of support vector machine predictions was higher than that of logistic regression predictions.
Techniques for predicting customer acquisition and retention can be utilized to identify customers who may be at risk of churn. Marketing strategies can be enhanced based on these predicted outcomes (Pondel, 2021). In the sports industry, customers play a vital role as the core of sports venues and as the driving force behind their success. Despite various challenges, national customer satisfaction indices have improved in most countries, and international satisfaction metrics have also developed globally. In Iran, providing quality services has consistently been a top priority for customers, reflecting their strong loyalty to high-quality offerings (Nasr Esfahani et al., 2024). Consequently, in an environment where customers are informed and possess the power of choice, neglecting their needs is no longer feasible. Currently, the market power balance is shifting toward customers; therefore, successful centers will be those that offer services that satisfy them. Retaining loyal customers over the long term is more beneficial than attracting new ones to replace those who have severed ties with sports facilities, particularly pools (Mohammadpoori et al., 2015).
In recent years, the water sports and recreation industry in Iran, especially in major cities like Isfahan, has witnessed significant growth. With the increase in competition among sports pools, attracting and retaining customers has become a fundamental challenge for the managers of these centers. Therefore, finding effective and scientific solutions to improve customer experience and increase their loyalty is essential in this field. Studies conducted in previous research indicate that despite the importance of customer retention and attraction, and the use of machine learning-based methods, educational methods based on various training and testing data that determine results in any condition without explicit planning (Patel & Prajapati, 2018), there is still a gap in the use of a precise approach to reduce feature dimensions and high accuracy in predicting customers of sports pools in Isfahan, which indicates a gap in scientific research. The present research, aimed at predicting the attraction and retention of customers of sports pools in Isfahan, is of particular importance. The use of data mining algorithms, especially decision trees, can significantly help in identifying patterns and factors influencing customer choice and loyalty. These tools can more accurately analyze customer needs and behaviors and, as a result, provide powerful models for predicting future customer behaviors. Additionally, the lack of sufficient research in this area and the scarcity of up-to-date and reliable data on the sports industry in Isfahan make this research even more necessary. On the other hand, the results of this research can help managers of sports pools design their marketing and service strategies in such a way that they not only attract new customers but also retain existing ones. Ultimately, this research can, in addition to helping improve the economic performance of sports pools, lead to increased customer satisfaction and the enhancement of sports services in society. Therefore, given the significant benefits in estimating the attraction and retention of customers of sports pools in Isfahan, this study aims to provide a fundamental tool for identifying the cognitive population of customers by utilizing a data mining approach, particularly decision trees, where each node represents a feature, each link represents a decision (rule), and each leaf represents a result (classification value or continuation); a model for evaluating the attraction and retention of customers of sports pools in Isfahan based on their behavior will be presented.
Table 1- Features used in the research
Variable Type |
Variable Name |
Row |
||
Discrete |
Options for using pool services (PO) |
Input variables to the predictive or predictive model |
1 |
|
Discrete |
Average monthly account recharge charge for using the pool (MM) |
2 |
||
Discrete |
Highest amount used (HP) |
3 |
||
Discrete |
Amount (amount) of ticket purchase (BA) |
4 |
||
Discrete |
Number of ticket purchases (BN) |
5 |
||
Discrete |
Volume of ticket purchases (BV) |
6 |
||
Discrete |
Customer satisfaction level (CS) |
7 |
||
Discrete |
Discount received (DR) |
8 |
||
Discrete |
Customer income (CI) |
9 |
||
Discrete |
Number of different services purchased (BD) |
10 |
||
Discrete |
Repeat ticket purchase for one pool (BR) |
11 |
||
Continuous |
Customer communication (CT) |
12 |
||
Continuous |
Service level (SL) |
13 |
||
Continuous |
Service quality level (QL) |
14 |
||
Continuous |
Type of service used (TS) |
15 |
||
Continuous |
Monthly services (MS) |
16 |
||
Discrete |
Time of service use (DT) |
17 |
||
Discrete |
Number of referrals to friends (FD) |
|
18 |
|
Discrete |
Duration of presence in the pool (TD) |
19 |
||
Discrete |
Number of times using the pools café and restaurant (CD) |
20 |
||
Discrete |
Swimming competitions (SM) |
21 |
||
Discrete |
Free lessons (FL) |
22 |
||
|
Binominal 0, 1 |
Churn |
Target variable/label/Target |
|
The process of data preparation and cleansing is vital for machine learning, ensuring data accuracy and consistency. It involves several steps to transform raw data into a usable format for model training. The process includes data cleansing, encoding, normalization, and feature selection, each addressing specific data issues and preparing it for analysis. This is crucial for building effective models and making accurate predictions:
Part 1: Preprocessing: When the values of dataset features are in different domains, the likelihood of errors in the results increases. Normalization is the process of adjusting the data of a statistical population to a similar domain. In the proposed model, normalization is performed using the following formula. Normalization Formula: The standard form of normalization places all data between the interval d1 to d2. The formula for normalization is given by:
Equation (1)
According to the data, d1= 0 and d2=+1. In other words, using this relationship, all data fall within the range [0,1]. In a dataset, there may be missing values for some records. The data in a dataset must be complete and free of missing or incomplete values when entered into an algorithm. Additionally, cases where values are likely assigned incorrectly to the features of a record should be corrected; if they cannot be corrected, they should be removed from the dataset. Unfortunately, the dataset does contain missing values. In this study, the maximum possible value method has been used to handle these missing values. In the maximum possible value method, among the acceptable values for a specific feature, the maximum value is chosen for substitution (Farhang Far et al., 2008).
Part 2: Feature Selection: After reading the dataset related to the customers of sports pools, and performing preprocessing operations on the data, the grasshopper optimization algorithm is formed. In this algorithm, N is the number of grasshopper colonies and D is the number of decision variables or dimensions of the optimization problem. Therefore, the grasshopper optimization algorithm is simulated by an N*D matrix. Each row corresponds to a possible solution to the optimization problem. In the proposed model, N is the number of records in the dataset and D is the number of features. The population of grasshopper colonies, which consists of a large number of grasshoppers and are responsible for exploring the objective, is defined according to equation (2). In the proposed model, the working method for the dataset is such that the grasshopper algorithm consists of N colonies, and each colony is made up of a number of grasshoppers (features). Each hive is defined by D features present in the database.
Equation (2)
In the set, each represents a possible solution in the solution space. Each category of grasshoppers consists of a group of attacker ants that are considered as elements of a solution. All attacker grasshoppers in a category are considered as a general unit that moves towards a suitable location with abundant resources. If a category of grasshoppers reaches an ideal position, an optimal solution is obtained. The evaluation of each category of grasshoppers is calculated based on the objective function according to equation (3).
(3)
In equation (3), is the fitness of the th category. The parameter is the value of the objective function for the th category. Each category is calculated based on the distance criterion. The parameters worst and best are the worst and best ant categories relative to the prey. In the proposed model, the grasshopper algorithm should be transformed from a continuous to a discrete state. This is because the dataset values are discrete and in the range of 0 and 1. Therefore, each grasshopper is defined in the category of the floor of equation (4). The population of categories is encoded with D features. To convert numbers to binary, the sigmoid function in equation (5) is used. The output of the sigmoid function falls within a specific numerical range (usually between zero and one). In this function, the answer will not be 0 or 1, but a set of numbers between zero and one.
(4)
(5)
The accuracy parameter is the percentage of accuracy and the values of the parameters δ and ρ are constant and their values are equal to 99 and 1, respectively.
In the proposed model, a subset of features is selected using an optimization algorithm to achieve the optimal value. The fitness function for feature selection from each category is defined according to equation (6). In equation (6), |n| is the total number of features, and |S| is the number of selected features. The parameter accuracy represents the percentage of correctness and the values of the parameters and constants are 99 and 1, respectively.
(6)
Dataset sinter printability with high dimensions, despite the opportunities they bring, creating many computational challenges. One of the problems with high-dimensional data is that in most cases, not all data features are vital for finding the hidden knowledge in the data. The main idea in feature selection is to eliminate a subset of input features that have little information. Therefore, feature selection is used to reduce the feature space and increase the efficiency of classification. Feature selection not only improves the accuracy and efficiency of classification, but also enhances the interpret ability of the results.
The Grasshopper Optimization Algorithm: Grasshoppers, despite being solitary in nature, form large groups that can be destructive to agriculture. Their unique group behavior is exhibited in both larval and adult stages, with larvae swarming and feeding on plants, and adults migrating in groups. This behavior has inspired optimization algorithms, such as the grasshopper optimization algorithm, which mimics their search and movement strategies (7):
(7) |
The equation defines the position of the th grasshopper, is the social interaction, is the gravitational force on the th grasshopper, and is the horizontal wind force. To provide random behavior, equation (7) can be rewritten as where , and are random numbers in the interval [0,1]. The component is defined by equation (8):
(8) |
That is the distance between th and th grasshopper, which is is calculated, is a function that defines the power of social forces as shown in equation (9), and is the unit vector from grasshopper to grasshopper .
The S function that defines the social force is calculated by equation (9):
(9) |
That is the absorption intensity and is the absorption length scale.
The parameters l and f in equation (9) influence the social behavior of artificial grasshoppers. Figure 1 illustrates the effects of varying these parameters, revealing significant changes in behavior across different zones. The attraction and repulsion zones exhibit particularly sensitive responses to specific parameter values (for example, = 1 or = 1).
Figure 1. Behavior of the function s when changing l and f
The G component is calculated by equation (10):
(10) |
|
In which g is the gravitational constant and the is the unit vector is towards the center of the earth.
Part 3: Classification: For classification, it is necessary to first divide the dataset into two parts: training (80% of samples) and testing (20% of samples). The training data generates the evaluation model, and the testing data tests the model generated with the help of some records and determines the label of those records, thus identifying their class. For this purpose, the decision tree algorithm is used; which will be discussed further below.
Decision tree classifiers are a widely recognized and effective technique for data classification. This method is favored for its simplicity and accuracy, making it a popular choice in machine learning, image processing, and pattern recognition. (Charbuty & Abdulazeez, 2021).
Different types of approaches for decision tree exist, among the most important are methods such as Iterative Dichotomies 3 (ID3), Successor of the ID3 algorithm (C4.5), Classification And Regression Tree (CART), CHi-squared Automatic Interaction Detector (CHAID), Multivariate Adaptive Regression Splines (MARS), Generalized, Unbiased, Interaction Detection and Estimation (GUIDE), Conditional Inference Trees (CTREE), Classification Rule with Unbiased Interaction Selection and Estimation (CRUISE), Quick, Unbiased and Efficient Statistical Tree (QUEST).
The most important criterion in evaluating the performance of decision trees is the entropy criterion. Entropy is used to measure the impurity or randomness of a dataset. The value of entropy is always between 0 and 1. A value of 0 is better, while a value of 1 is worse, meaning the closer the value is to 0, the better. This index is calculated based on equation (11):
(11) |
That in this regard, is equal to the rate of the number of subset samples and the value of the characteristic is (Charbuty & Abdulazeez, 2021).
Part 4: Evaluation Criteria: This study employs five metrics—accuracy, recall, measurement, precision, and error rate—to assess the effectiveness of a classification algorithm. These criteria provide insights into the algorithm's performance, particularly its ability to correctly classify samples and its reliability in assigning labels (Kamel et al., 2019):
TP: Number of features correctly identified.
TN: Number of false positives detected.
FP: The number of correct features was incorrectly identified as wrong.
FN: Number of incorrect features mistakenly identified as correct.
(12)
- Precision: It is the ratio of correctly classified positive samples to all available positive samples.
(13)
- Recall: It is the ratio of correctly classified positive samples to the total number of samples that have been diagnosed as positive. It should be noted that some of the samples that have been diagnosed as positive are wrong and are included in the FN collection.
(14)
This text introduces the need for a predictive model to forecast customer behavior in sports facilities, particularly in Isfahan's sports pools, to aid economic and management decisions. The research aims to develop and compare hybrid algorithms for accurate predictions until 2024, emphasizing the importance of such models in investment and management strategies.
Predicting the Attraction and Retention of Customers in Sports Pools that all results were obtained from programming in MATLAB 2021a software on a system with a Core i5 processor and 8GB of RAM.
Data Preprocessing: One of the most important stages of a data mining method is data preprocessing. In fact, preprocessing determines the results that will be achieved, and its significance is such that it can lead to either the best or the weakest outcomes. Therefore, in this study, preprocessing was conducted thoroughly according to established articles and in a principled manner, which includes the following steps:
Data Splitting: In this study, 70% of the data is used for training the network, while the remaining 30% is used to test the model. This splitting is completely random to ensure that all data is utilized in both tasks. The MATLAB software function can automatically create random indices and place the data corresponding to these indices in their respective matrices. Since the final number of data points is 54,000 rows, 70% of this amount equals 37,800. Therefore, 37,800 data points are used to create the classification model, while the remaining 16,200 data points (30% of the original data) are used to test the mode.
Creating Classification: The research employed decision tree and K-Nearest Neighbors classification methods, with feature selection via grasshopper and SA algorithms. The fusion of the Grasshopper Optimization Algorithm and decision tree (GOA-DT) achieved impressive results, with a high precision of 90.9091% and a low error rate (Figure 2). This hybrid method outperformed the SA-DT algorithm, showcasing its effectiveness in predicting customer churn with fewer features.
The KNN algorithm is introduced for feature classification, the results of this change are presented in Table 2, achieving an accuracy of 83.8636% when combined with the grasshopper algorithm (Figure 3). This is lower than the grasshopper-decision tree (GOA-DT) method, which has an accuracy of 90.9091%. The KNN-GOA combination, along with the SA algorithm, selects 9 features with the same accuracy. Tables showcase the selected features and comparative analysis of various algorithms' accuracy, solving time, and error rates (Tables 3-7). The GOA-DT algorithm is deemed superior for feature selection and customer behavior analysis.
Table 2- The results of applying the proposed algorithm (grasshopper-decision tree)
Parameter |
Amount |
TP |
284 |
TN |
116 |
FP |
20 |
FN |
20 |
Precision |
%93.4211 |
Recall |
%93.4211 |
Accuracy |
%90.9091 |
F-Measure |
%93.4211 |
Figure 2. Comparison of the proposed algorithm's performance with the SA-DT algorithm
Table 3- Features selected by both GOA and SA algorithms in combination with the decision tree
GOA Algorithm |
SA Algorithm |
Monthly average recharge charge for pool account usage |
Monthly average recharge charge for pool account usage |
Number of ticket purchases (BN) |
Amount of ticket purchase (BA) |
Customer satisfaction level (CS) |
Customer satisfaction level (CS) |
Discount received (DR) |
Discount received (DR) |
Service quality level (QL) |
Customer communication (CT) |
Monthly services (MS) |
Service level (SL) |
Free tutorials (FL) |
Number of referrals to friends (FD) |
- |
Free tutorials (FL) |
Table 4- Results of applying the K-nearest neighbor algorithm to the grasshopper algorithm
Parameter |
Amount |
TP |
260 |
TN |
109 |
FP |
53 |
FN |
18 |
Precision |
%83.0671 |
Recall |
%92.5252 |
Accuracy |
%83.8636 |
F-Measure |
%87.9865 |
Figure 3. Comparison of the performance of the GOA-KNN algorithm with the SA-KNN algorithm
Table 5- Features selected by both GOA and SA algorithms in combination with K-nearest neighbors
Algorithm GOA |
Algorithm SA |
Monthly average recharge charge of pool account usage |
Monthly average recharge charge of pool account usage |
Number of ticket purchases (BN) |
Customer satisfaction level (CS) |
Customer satisfaction level (CS) |
Discount received (DR) |
Discount received (DR) |
Customer contact (CT) |
Service quality level (QL) |
Service level (SL) |
Monthly services (MS) |
Free lessons (FL) |
Time of service usage (DT) |
- |
Number of times using the café and pool restaurant (CD) |
- |
Free lessons (FL) |
- |
Table 6- Comparison of the accuracy of the proposed algorithm with the algorithms used
Algorithm |
Precision |
Number of features selected |
GOA – DT |
90.9091 |
7 |
GOA – KNN |
83.8636 |
9 |
SA – DT |
87.9545 |
8 |
SA – KNN |
82.9545 |
6 |
Table 7- Error rates obtained from grasshopper-based algorithms
Algorithm |
Error |
GOA – DT |
0.0909 |
GOA – KNN |
0.1614 |
In addition, as shown in Table 8, the results of the study on non-churn customers in sports pools, aimed at investigating their churn in the years 2023 and 2024, are presented. It should be noted that a threshold of 0.6 has been established for estimating the likelihood of customer churn; this means that customers whose predicted target segment is equal to or greater than 0.6 are more likely to churn, while customers whose target value is less than 0.6 are considered less likely to churn. Consequently, values of 0 and 1 are assigned for non-churn and churn, respectively, as the target column in the original data also contained binary values of 1 and 0. Therefore, customers labeled with 1 are expected to churn by 2024.
Table 8- Customer Churn Forecast until 2024
Churn/Non-churn |
2024 |
2023 |
Customer number |
Churn/Non-churn |
2024 |
2023 |
Customer number |
1 |
1.0000 |
1.0000 |
66 |
0 |
0.5562 |
0.1604 |
1 |
1 |
1.0000 |
1.0000 |
67 |
0 |
0.7346 |
0.3872 |
2 |
0 |
0.8307 |
0.3065 |
68 |
0 |
0.7786 |
0.8775 |
3 |
0 |
0.3728 |
0.3390 |
69 |
0 |
0.2641 |
0.3546 |
4 |
0 |
0.2883 |
0.4348 |
70 |
0 |
0.8677 |
0.3268 |
5 |
0 |
0.8551 |
0.0897 |
71 |
0 |
0.7018 |
0.2793 |
6 |
0 |
0.7494 |
0.3326 |
72 |
0 |
0.7311 |
0.7859 |
7 |
0 |
0.9614 |
0.6220 |
73 |
0 |
0.7230 |
0.9035 |
8 |
0 |
0.9586 |
0.1302 |
74 |
0 |
0.4963 |
0.5324 |
9 |
0 |
0.2544 |
0.3873 |
75 |
0 |
0.8276 |
0.8510 |
10 |
0 |
0.2063 |
0.8628 |
76 |
0 |
0.5211 |
0.9227 |
11 |
0 |
0.3018 |
0.6146 |
77 |
0 |
0.2342 |
0.1829 |
12 |
0 |
0.1866 |
0.0683 |
78 |
0 |
0.3391 |
0.6263 |
13 |
0 |
0.3754 |
0.8564 |
79 |
0 |
0.6451 |
0.2910 |
14 |
0 |
0.3669 |
0.4584 |
80 |
0 |
0.4851 |
0.1162 |
15 |
0 |
0.6384 |
0.1755 |
81 |
0 |
0.0465 |
0.5764 |
16 |
0 |
0.4601 |
0.6705 |
82 |
1 |
1.0000 |
1.0000 |
17 |
0 |
0.0503 |
0.4764 |
83 |
0 |
0.4219 |
0.8833 |
18 |
0 |
0.5441 |
0.1426 |
84 |
0 |
0.5678 |
0.6700 |
19 |
0 |
0.8035 |
0.3142 |
85 |
0 |
0.7636 |
0.3363 |
20 |
0 |
0.9538 |
0.3536 |
86 |
0 |
0.6063 |
0.2701 |
21 |
0 |
0.8649 |
0.0891 |
87 |
0 |
0.8621 |
0.3013 |
22 |
0 |
0.4259 |
0.9415 |
88 |
1 |
1.0000 |
1.0000 |
23 |
0 |
0.4429 |
0.6721 |
89 |
1 |
1.0000 |
1.0000 |
24 |
0 |
0.2851 |
0.6560 |
90 |
0 |
0.7985 |
0.4798 |
25 |
0 |
0.3472 |
0.5142 |
91 |
0 |
0.5813 |
0.6083 |
26 |
0 |
0.3086 |
0.1040 |
92 |
0 |
0.8847 |
0.0902 |
27 |
0 |
0.6348 |
0.2288 |
93 |
0 |
0.8260 |
0.6688 |
28 |
0 |
0.9975 |
0.6462 |
94 |
0 |
0.5130 |
0.2012 |
29 |
0 |
0.7781 |
0.3403 |
95 |
0 |
0.5670 |
0.7881 |
30 |
0 |
0.7639 |
0.2889 |
96 |
0 |
0.7573 |
0.4157 |
31 |
0 |
0.9550 |
0.1388 |
97 |
0 |
0.9752 |
0.4848 |
32 |
0 |
0.8500 |
0.3276 |
98 |
0 |
0.3275 |
0.8537 |
33 |
0 |
0.8165 |
0.0876 |
99 |
0 |
0.7061 |
0.9241 |
34 |
0 |
0.0553 |
0.2538 |
100 |
0 |
0.3274 |
0.2874 |
35 |
0 |
0.1916 |
0.4184 |
101 |
0 |
0.0894 |
0.9333 |
36 |
0 |
0.1558 |
0.8761 |
102 |
0 |
0.2192 |
0.8538 |
37 |
0 |
0.7151 |
0.6549 |
103 |
1 |
1.0000 |
1.0000 |
38 |
0 |
0.6031 |
0.6132 |
104 |
0 |
0.5196 |
0.5410 |
39 |
0 |
0.7802 |
0.3883 |
105 |
0 |
0.7111 |
0.9562 |
40 |
0 |
0.3254 |
0.8792 |
106 |
0 |
0.8336 |
0.0387 |
41 |
0 |
0.7127 |
0.5760 |
107 |
0 |
0.7742 |
0.1917 |
42 |
1 |
1.0000 |
1.0000 |
108 |
0 |
0.9505 |
0.7653 |
43 |
0 |
0.8477 |
0.7182 |
109 |
0 |
0.3014 |
0.3834 |
44 |
0 |
0.2626 |
0.4621 |
110 |
0 |
0.8054 |
0.5108 |
45 |
1 |
1.0000 |
1.0000 |
111 |
0 |
0.3221 |
0.3176 |
46 |
1 |
1.0000 |
1.0000 |
112 |
0 |
0.5790 |
0.9832 |
47 |
0 |
0.8909 |
0.9078 |
113 |
0 |
0.5813 |
0.9016 |
48 |
0 |
0.9784 |
0.5197 |
114 |
0 |
0.3573 |
0.9194 |
49 |
0 |
0.8588 |
0.2219 |
115 |
0 |
0.5157 |
0.6245 |
50 |
0 |
0.7270 |
0.3921 |
116 |
0 |
0.6426 |
0.6781 |
51 |
0 |
0.3416 |
0.1789 |
117 |
0 |
0.4231 |
0.1862 |
52 |
0 |
0.3603 |
0.1269 |
118 |
0 |
0.1485 |
0.2739 |
53 |
0 |
0.8123 |
0.9537 |
119 |
0 |
0.8720 |
0.3554 |
54 |
0 |
0.4318 |
0.1152 |
120 |
1 |
1.0000 |
1.0000 |
55 |
1 |
1.0000 |
1.0000 |
121 |
0 |
0.3009 |
0.5465 |
56 |
1 |
1.0000 |
1.0000 |
122 |
0 |
0.1154 |
0.6917 |
57 |
0 |
0.0430 |
0.7022 |
123 |
1 |
1.0000 |
1.0000 |
58 |
0 |
0.4792 |
0.2339 |
124 |
0 |
0.5487 |
0.3066 |
59 |
0 |
0.0711 |
0.9802 |
125 |
0 |
0.1793 |
0.1296 |
60 |
0 |
0.9311 |
0.1521 |
126 |
0 |
0.0795 |
0.8714 |
61 |
0 |
0.1322 |
0.2684 |
127 |
0 |
0.9549 |
0.6772 |
62 |
0 |
0.7155 |
0.7124 |
128 |
0 |
0.5721 |
0.5042 |
63 |
0 |
0.4292 |
0.7571 |
129 |
0 |
0.6538 |
0.8346 |
64 |
|
|
|
|
1 |
1.0000 |
1.0000 |
65 |
In this study, a combination method based on the grasshopper Algorithm was used to enhance the accuracy of customer behavior estimation and prediction systems while reducing feature dimensions. First, the data obtained from the database was normalized following the preprocessing stages. In the next stage, using DT and KNN algorithms, combined with SA and GOA, the data was tested and evaluated. To assess the proposed method, criteria such as accuracy, detection rate, sensitivity, and performance rate were utilized, and each of these metrics was obtained for the desired algorithm. Given that the proposed method is based on the questions considered in this study, it can be briefly stated that: How can a decision tree-based model be presented to predict the level of attraction and retention of customers in sports pools in Isfahan City? The implementation of the decision tree method requires the collection of relevant data. Since the decision tree is one of the most common data mining methods, we collected data related to Isfahan sports pools over six years, from 2018 to 2023, for 54,000 customers. This allowed us to estimate customer behavior by implementing various stages of the algorithm, such as labeling, classification, and prediction.
On the other hand, since the decision tree is a powerful binary analysis method (0 and 1), the dataset was adjusted to reflect 0 (non-churn) and 1 (churn) customers, ensuring that the decision tree method could be implemented smoothly.
What features are effective in attracting and retaining customers of sports pools in Isfahan City? According to the study of articles and evaluations of experts' opinions, a total of 22 features were identified as significant for attracting and retaining customers of sports pools in Isfahan City. By applying the grasshopper Algorithm to select the most effective features, it was determined that ultimately 7 features are recognized as key indicators in this context: Average monthly account recharge, Pool usage, Number of, ticket purchases, Customer satisfaction level, received discounts, Service quality level, Monthly services Free training.
Can the proposed approach have better performance in terms of accuracy compared to previous methods? The results showed that the proposed method achieved a higher efficiency of 90.9091% accuracy compared to other methods.
Therefore, in today’s economic world, having accurate and timely information is invaluable for owners, investors, creditors, and other stakeholders to make informed financial decisions. With the development of technology, it is now possible to use simple customer behavior prediction models for all sports centers and collections. The availability of straightforward yet powerful prediction tools can help owners prevent bankruptcy and take necessary actions to improve the situation regarding customer churn or retention. On the other hand, such tools can serve as a strong driver for selecting optimal investment portfolios for investors. Investors can better inform themselves about the past, present, and future of these centers. Predicting customer churn in sports centers is a crucial issue in financial decision-making within this sector. Given the effects and consequences of this phenomenon at both micro and macro levels in societies, various tools and models of significant importance have been developed, each differing in methods or variables for prediction, at both national and international levels.
In this study, the performance of algorithms in predicting customer churn or non-churn in Isfahan sports clubs over a 6-year period was evaluated using collective learning techniques and the combination of classification algorithms in data mining. The collected data pertains to 54,000 customers and includes 22 initial features. To evaluate the features, two optimization algorithms, GOA and SA, were employed for feature selection, while classification algorithms such as DT and KNN were utilized for classification and behavior recognition. The results indicated that the combined algorithm GOA-DT, with an accuracy of 90.9091%, outperformed the SA-DT algorithm, which achieved an accuracy of 87.9545%. Notably, the GOA-DT algorithm selected only 7 out of the 22 effective features for identifying customer churn, whereas the SA-DT algorithm resulted in the selection of 8 features. In the subsequent step, the KNN algorithm was used instead of the DT algorithm. The results showed that the KNN algorithm, in combination with GOA and SA, ultimately achieved an accuracy of 83.8636% for the GOA algorithm with the selection of 9 features.
Based on these findings, it can be concluded that the combined algorithm GOA-DT demonstrates the best performance, utilizing an average of 7 features: Monthly account recharge, Pool usage, Number of ticket purchases, Customer satisfaction level, Received discounts, Service quality level, Monthly services, Free training. These features can effectively be used to estimate customer behavior.
In Table 8, the results of investigating non-churning customers in sports pools to examine their churn in the years 2023 and 2024 are presented. It should be noted that a threshold of 0.6 has been set for estimating the likelihood of customer churn. This means that customers whose predicted target segment is equal to or greater than 0.6 are more likely to churn, while customers whose target value is less than 0.6 are assumed to have a lower likelihood of churning. Consequently, values of 0 and 1 are assigned for no churn and churn, respectively, as the target column in the original data also contained binary values of 1 (churn) and 0 (non-churn). Therefore, as indicated in Table 8, customers labeled 1 are predicted to churn by the year 2024. This information can be utilized by sports pool managers to make necessary decisions aimed at mitigating the negative impacts of customer churn. This issue is particularly significant given the current circumstances, which include specific restrictions such as epidemics, crises, and unforeseen events. Additionally, families are increasingly sensitive to being in sports environments like pools, and there has been a decline in the financial capacity of recreational sports families, resulting in a decreased tendency to use sports pools compared to the past. Consequently, this has led to increased losses for investors and managers of sports pools, with some facilities experiencing a significant drop in customers and increased churn rates.
In this context, practical suggestions such as encouraging monthly account recharges can be beneficial, as this often reflects customer satisfaction with pool services. Enhancing motivation for monthly account recharges by providing additional incentives and improving ticket rates in relation to the services offered could yield financial benefits. Another effective feature influencing customer behavior is the number of ticket purchases, particularly among customers who utilize pool services infrequently. Offering discount services, such as one free ticket for every five purchased, can attract more customers to the pool. Customer satisfaction is also a critical feature that all management systems strive to enhance. By employing specialized customer relationship management (CRM) teams, attractive solutions can be developed to boost customer satisfaction levels. Free training sessions, such as swimming techniques and diving lessons, can further increase customer motivation. Additionally, providing monthly services as part of the CRM system can enhance motivation and help introduce long-time users to new customers. Lastly, the quality of services is directly related to customer satisfaction; prioritizing service quality can help distinguish a pool from its competitors in the area. Thus, the attention and consideration of these suggestions by managers and officials of recreational sports pools could be the key to the success of these venues. One of the main limitations in data mining-based research is the presence of incomplete or inconsistent data. In the case of sports pools in the city of Isfahan, it is possible that information related to customers, surveys, or their behavior may not be fully collected, which can affect the accuracy of predictive models. Therefore, to overcome the problem of incomplete data, a regular and systematic data collection system can be established, which includes periodic surveys of customers and recording their behaviors at different times. Additionally, data should be continuously updated to accurately and timely reflect changes.
The authors would like to appreciate the Isfahan (Khorasgan) Branch Islamic Azad University.
The authors declare that there is no conflict of interest regarding the publication of this paper.
The authors declare that this research was done with the financial support of Isfahan (Khorasgan) Branch Islamic Azad University.