Warning: Undefined array key 0 in /home/elitktop/public_html/wp-content/themes/research-paper-writers/inc/research-paper-writers-function.php on line 901

Warning: Attempt to read property "term_id" on null in /home/elitktop/public_html/wp-content/themes/research-paper-writers/inc/research-paper-writers-function.php on line 901

Warning: Undefined array key 0 in /home/elitktop/public_html/wp-content/themes/research-paper-writers/inc/research-paper-writers-function.php on line 902

Warning: Attempt to read property "name" on null in /home/elitktop/public_html/wp-content/themes/research-paper-writers/inc/research-paper-writers-function.php on line 902
Our Process

Get Paper Done In 3 Simple Steps

Place an order

Visit the URL and place your order with us. Fill basic details of your research paper, set the deadlines and submit the form.

Make payments

Chat with our experts to get the best quote. Make the payment via online banking, debit/credit cards or through paypal. Recieve an order confirmation number.

Receive your paper

Sit back and relax. Your well written, properly referenced research paper will be mailed to your inbox, before deadline. Download the paper. Revise and Submit.

Shape Thumb
Shape Thumb
Shape Thumb

 

 Abstract

The initial stage in the software development life cycle is defining requirements, a phase that holds significant importance and is now a fundamental task in the field of software engineering. The main aim of this study was to explore the effectiveness of using a one-way analysis of variance (ANOVA) as a feature selection method with Machine Learning (ML) algorithms to considerably reduce the number of features when classifying software requirements into two categories: functional requirements (FRs) and nonfunctional requirements (NFRs), including the subcategories of NFRs. The primary motivation for this research was that no previous studies had comprehensively examined one-way ANOVA to enhance performance in classifying software requirements. Hence, various experiments were conducted to investigate the effects of one-way ANOVA and to select important features concerning the performance of different supervised machine learning classifiers in the PROMISE_exp dataset. Support vector machine achieved the best F1 score with one-way ANOVA compared to other experiment results, attaining 96.69% in the classification of FRs and NFRs and 62.33% in that of the 11 subcategories of NFRs. As further evidence, one-way ANOVA achieved the highest F1 score compared to other studies on the classification of FRs and NFRs.

 

 

1.  Introduction

Text classification refers to organizing text documents into classes based on characteristics belonging to each text. It is broadly seen as a supervised learning task, defined as identifying classes of new documents based on a specified training corpus of already labeled (identified) documents [1]. Text classification is used in several domains, including spam identification, news categorization and etc. The concept may seem simple, and with a small number of documents, it is possible to analyze each document manually and obtain an idea of the class in which a document belongs. However, it is a more challenging activity when the number of documents to be classified increases to several hundred thousand or a million. Document classification is a generic problem not limited to text alone but can also be extended to other items, such as music, images, videos, and other media [2].

The task of software requirement classification consists of specifying the category to which a given software requirement belongs [3]. The categories of a requirement are classified in two classes: functional requirements (FR), which describe the services, behavior, or functions that a system provides, and nonfunctional requirements (NFRs), which include the attributes (e.g., quality, usability, security, and privacy) or restrictions of the application to be developed or of the software development process[4]. Even when the software requirements are well known and well described, it is still a challenge to automatically classify them into FR and the subcategories of NFR requirements written in natural language. According to Abad et al. [5], this is largely due to the fact that stakeholders, as well as requirements engineers, use different terminologies and sentence structures to describe the same kinds of requirements. The high inconsistency in requirements elicitation makes automated classification more error prone, so the problem is finding optimal ways to realize a good, automated classification. Furthermore, such automatic classification is needed because manual classification of software requirements is time-consuming, especially on large projects with a huge number of requirements [6].

In machine learning (ML), classification texts still require many challenges to be overcome and include a considerable number of features (very high-dimensional feature spaces) [7]. This complex problem can be addressed by removing redundant or noisy features that do not contain important information related to a specific task. Dimensionality reduction can be achieved by selecting a subset of features from the full set of features[8] in a technique called feature selection (FS). One-way analysis of variance (ANOVA) is used as an FS filter to measure the impact of a feature on a target class in ML tasks. One-way ANOVA has proven effective in improving results in different text classification problems [9][10].

To the best of our knowledge, one-way ANOVA has not yet been studied in the context of the software requirement classification problem as an FS technique compared to other text classification problems. Accordingly, in this paper, one-way ANOVA is implemented to reduce the high-dimensionality problem and simultaneously improve the results of a classifier. The proposed methodology provides a valuable opportunity for investigating the effectiveness of one-way ANOVA across different ML models, such as support vector machine (SVM), K-nearest neighbor (k-NN), and naive Bayes (NB) [11]. These comparisons demonstrate the best combination (algorithm and one-way ANOVA) of classifying software requirements. Hence, the paper references future work in the ML community using one-way ANOVA as an FS method for classifying software requirements.

The remainder of the paper is organized as follows. Section 2 presents a literature review relevant to the software requirement classification problem using ML and introduces one-way ANOVA as an FS method. The methodology used to achieve the paper’s aim is illustrated in Section 3. Section 4 reveals and discusses the results obtained. Finally, the conclusion is reported in Section 5.

2.    Related Work and Background

There are two purposes of this section: to discuss state-of-the-art studies that have employed ML algorithms in automatically classifying software requirements and to describe one-way ANOVA as a filter to reduce high dimensionality.

The ML approach has recently gained popularity among researchers in text classification due to its accuracy for automated text mining. A significant number of works have adopted ML algorithms to classify software requirements into appropriate classes.

Canedo et al. [12] compared different feature selection techniques and ML algorithms to the problem of requirements engineers. The authors provided a framework to classify requirements into FR and NFR as the binary classification problem and 11 classes of NFR requirements as the multiclass problem. The experiments were conducted on the PROMISE_exp dataset with different feature selection techniques, such as bag of words (BoW), term frequency–inverse document frequency (TF-IDF), and chi-squared (CHI2), with logistic regression (LR), SVM, multinomial naive Bayes (MNB), and -NN as ML algorithms. The combination that contained TF-IDF and LR had the best performance measures for binary classification and NFR classification, with an F1 score of 91% for the binary classification and 74% for the 11 classes of NFRs.

Kurtanović et al. [13] automatically classified requirements as FRs and NFRs in the RE17 dataset with supervised ML. They also identified four classes of NFRs—in particular, usability, security, operational, and performance requirements. To select the informative features, a scoring classifier was constructed as an ensemble of tree classifiers. The under- and over-sampling strategies handled the imbalanced classes in the dataset and cross-validated the classifiers using precision, recall, and F1 score in experiments based on the SVM classifier algorithm. The authors achieved precision and recall of up to ~92% for automatically identifying FRs and NFRs. For the identification of specific NFRs, they achieved the highest precision and recall for security and performance NFRs, with ~92% precision and ~90% recall.

Jindal et al. [14] employed the J48 decision tree method to classify descriptions into authentication, access control, cryptography-encryption, or data integrity as security requirements. The Info-Gain measure was used to select features for classifying requirements into one of four security class requirements. The best result was 83% in terms of the Receiver Operating Characteristic (ROC) curve.

Quba et al. [15] used text BoW with SVM and-NN to classify requirements for two categories (NFRs and FRs); these algorithms were evaluated on the PROMISE_exp database. The authors found that the use of BoW with SVM algorithms had better performance measures, with F1 scores of 90% for the binary classification (FR or NFR) and 66% for 11 subcategories of the type of NFR (availability, performance, security, etc.).

Based on the latest review of ML algorithms for the classification of NFRs [11], published in 2019, two FS methods were solely employed: information gain [14], [16], [17] and CHI2 [18].

According to the aforementioned discussion, and to the best of our knowledge, no study has used the ML approach with one-way ANOVA in the classification software requirements problem.

 

2.1  Analysis of Variance

ANOVA is a statistical method that decides whether the mean value of two or more groups is different [19]. It uses a probability distribution to measure variance. In statistics, the probability value (-value) is the probability of obtaining the observed results of a test. Assuming that the null hypothesis ()—no difference—is correct, the statistical results will be equal to or more than the actual observed results. Therefore, p-values are used as an indicator of how inconsistent data are in a particular statistical model. The p-value is used as an option for rejecting the  according to the comparison of the results with the significance level. The significance level (α) is the probability of rejecting the null hypothesis when it is true. For example, an α of 0.05 implies a 5% danger of inferring that a difference exists while there is no actual difference. Hence, a smaller p-value implies that there is stronger evidence to support the alternative hypothesis () [20].

One-way ANOVA calculates a score for all features and then selects the features with the highest scores. A feature affects the target class if there is a difference between groups in terms of variance, which is the average of the squared differences from the mean. This leads to rejecting , which states that all means of groups are equivalent, and accepting , which is the opposite of . Deciding relevant features using one-way ANOVA requires determining the threshold at which each feature is evaluated individually in terms of correlation with classes. Using one-way ANOVA as an FS filter helps measure the impact of a feature on a target class. Consequently, each feature will have an F-value and a p-value as a score or weight. According to the calculated score, the important features will be determined. A higher F-value indicates a feature that impacts the class and will be considered relevant. Moreover, a p-value lower than the significance level (e.g., 0.05) will be recognized as an important feature. Some studies have used percentages to choose the highest F-values of features, which are later forwarded to ML classifiers [21]. On the other hand, some studies have used the p-value to determine the important features of the target classes [22][10].

 

3.  Methodology

This section presents a solution for classifying software requirements using one-way ANOVA. Figure 1 shows the framework for classifying software requirements into FR and NFR, with NFR then categorizing them into appropriate classes, such as security, availability, and usability.

Figure 1:  The framework for the proposed methodology.

As shown in Figure 1, the main phases are text normalization, feature extraction, FS, and model evaluation. The programming language Python is used to apply the methodology. Previous stages were implemented individually for the binary classification of software requirements into FR and NFR and then the multi-class classification of NFR. The details of the framework’s steps are described in the following subsections.

  • Dataset

The PROMISE_exp [3] dataset was used to perform the proposed methodology. This public dataset was inspired by the UCI Machine Learning Repository and was created to encourage repeatable, verifiable, refutable, and/or improvable predictive software engineering models[1]. The repository consists of a labeled set of 444 FRs and 525 NFRs, the latter subclassified into 11 different types. Figure 2 shows the number of requirements for each NFR class in the dataset.

Figure 2: Distribution of NFR classes in the dataset.

 

  • Text Normalization

Text normalization is defined as a process consisting of a series of steps that should be followed to wrangle, clean, and standardize textual data into a form that can be consumed by natural language processing and analytics systems and applied as input [2]. One of these steps is tokenization, which consists of dividing a text into a list of tokens, which can be sentences or individual words, depending on the researcher’s choice. Often, tokenization is also part of the normalization of the text. Besides tokenization, several other techniques are part of the normalization process, such as case conversion, spelling correction, the removal of irrelevant words and unnecessary terms, stemming, and lemmatization.

In this paper, the normalization process is applied to the dataset containing the software requirements, where the documents in the corpus are tokenized, the texts are converted to lowercase, irrelevant words are removed, and the words are converted to meaningful base form using the lemmatization step.

  • Feature Extraction

After the text normalization stage has been implemented, the textual content is ready to be converted into a numeric representation that could be understood by ML classifiers. A text-based system needs a suitable representation of a text according to the kind of task to be performed [23]. In textual data, converting words into a set of vectors is called feature extraction, which is an important stage in representing text for ML classifiers because it converts unstructured data into structured data. There are numerous methods to convert text into a manageable representation as vectors, such as TF-IDF, BoW, and word embedding.

In this paper, TF-IDF is used in the experiments. This is a useful method in text representation when the frequency of words is an indicator of important terms. TF-IDF can be calculated as follows

,                                                 (1)

where the first part of the equation contains term frequency (TF). TF refers to the number of occurrences of a word in a given requirement and is defined as follows:

.                                   (2)

The inverse document frequency (IDF), another part of the TF-IDF calculation, provides higher weights for rare words and lower values for common words. The formula is as follows:

.                                      (3)

The range of the -gram is set at 1–3 to grant the FS method (a subsequent stage) more feature forms (unigram, bigrams, trigrams).

 

  • Feature Selection

Filter one-way ANOVA performs an analysis of variance for each feature, whereby the class variable is explained by the feature. Two one-way ANOVA methods, based on the F-value and the p-value, can be used to statistically select the important features.

In the first of these methods, the features are selected according to the F-values and based on the given percentile () of the original number of features. Only the top-scoring features are used to train ML classifiers. In this study, the value of  ranges from 10 to 40 for selecting the best value of  that introduces the best classification result.

The second method depends on the p-values in a one-way ANOVA, which determine the relevant features of the classification task and compare them to the significance level. If the p-value of a feature is less than the significance level, the feature is kept for further processing; otherwise, it is discarded. The significance level (α) is usually set at 0.05 [22]. In the proposed methodology, different values of α are tried (ranging from 0.5 to 0.01), and the best one displays the best results.

One of the most remarkable differences between these methods is that the method based on F-scores requires the percentage of features to be determined, while the other method relies on the condition in selecting the features. Moreover, the p-value criterion is more stringent for filtering features than the selected percentage method. Consequently, a set of features is chosen by the p-value, which may be a subset of the feature set determined by the percentage method.

  • Cross-Validation Method

Three ML classifiersSVM, NB, and k-NN—are implemented, and these are employed in different experiments to discover the effectiveness of one-way ANOVA in terms of classifier performance.

SVM works by constructing the optimal hyperplane (decision surface) in the training phase to separate the data with maximum generalization ability. NB is a probabilistic algorithm that provides a probability distribution over output classes. The k-NN classifier is a case-based learning algorithm that uses a distance or similarity function for pairs of observations, such as Euclidean distance or cosine similarity measures.

One of the most popular methods for tuning hyperparameters is cross-validation (CV), which was mentioned in [24]. In a 10-fold CV, 10 values of the performance measure are calculated as the common CV for each hyperparameter setting. Then, the mean-tested performance measure is computed for each hyperparameter setting. The highest average-tested performance measure serves as the final performance metric for the model.

The grid search algorithm is essentially an optimization algorithm used to select the values of the hyperparameters of a specific problem or algorithm that achieve the best results. This algorithm is used in this study with a 10-fold CV to choose the best values of the hyperparameters for each classifier.

 

  • Model Evaluation

Evaluation metrics are primarily used to evaluate the performance of a classifier. The performance is verified through mathematical formulas that compare the predictions obtained by a model with the actual values in the database.

Precision measures the percentage of the quantity of correctly classified samples in relation to the total number of samples. It is calculated using the ratio of the total number of correct classifications to the total number of classifications performed. This calculation can also be seen as the ratio of the quantity of true positives (TP) to the quantity of positives (TP + FP):

=.                                                  (4)

A precision equal to 1 indicates 100% of the classifier and that all the instances are classified correctly.

Recall is the proportion of positive instances detected correctly by the classifier. It is calculated according to the number of times a class is correctly predicted (TP) divided by the number of times that the class appears in the test data (FN), as follows:

=.                                                    (5)

It is often convenient to combine precision and recall into a single metric called the F1 score (also known as the F-measure), particularly when a simple way to compare two classifiers is needed [25].

The F1 score is the harmonic mean of precision and recall. Whereas the regular mean treats all values equally, the harmonic mean gives much more weight to low values. As a result, the classifier will only get a high F1 score if both recall and precision are high [25].

.                                                       (6)

The F1 score is used to compare different experiments because the PROMISE_exp dataset is unbalanced. Hence, the suitable measure in case case the F1-score.

 

4.  Results and Discussion

To analyze the impact of ANOVA on the performance of ML algorithms in classifying software requirements into a suitable class, experiments with different scenarios were conducted to determine the best situation of the algorithms’ performance, as shown in Table 1. The results of the experiments were evaluated for a comparative analysis of the performance of the classification algorithms. The F1 score was used to define the best algorithm and feature construction method.

Table 1: The experimental results

Task Algorithm %m of F-values P-value #Features F1 score
1 SVM Baseline: 14685 81.09
39% 5727 92.33%
0.35 6740 96.69%
NB Baseline: 14685 81.95%
16% 2349 89.81%
0.22 2090 89.98%
-NN Baseline: 14685 80.04%
10% 1468 76.24%
0.11 664 78.26%
2 SVM Baseline: 9465 37.18%
12% 1135 62.33%
0.02 988 61.91%
NB Baseline: 9465 40.97%
30% 2840 51.3%
0.46 2875 52.01%
-NN Baseline: 9465 41.57%
10% 947 37.43%
0.02 988 38.36%

 

Table 1 shows that SVM attained the best performance compared to the other algorithms after using one-way ANOVA in both tasks. Moreover, NB also showed the best performance compared to the baseline experiment in both tasks after using ANOVA. On the other hand, using one-way ANOVA with k-NN had a negative impact on the results of both tasks.

In the case of the first task, the highest F1 score (96.69%) was produced with 6740 features, where α = 0.35. In the second task, the best F1 score (62.33%) was achieved by SVM, with 1135 features representing the  top-scoring features.

To compare this study’s results with those from other research works, the results were compared with [12] [15], which used the same dataset, as shown in Table 2.

 

 

 

 

Table 2: The results of the proposed solution compared to the best state-of-the-art results across the same dataset.

Paper Best F1 Score Results
Canedo et al. (2020) [12] First Task (FR and NFR): 91%

Second Task (Eleven classes of NFR requirements): 74%

Quba et al. (2021) [15] First Task (FR and NFR): 90%

Second Task (Eleven classes of NFR requirements): 66%

The Current study First Task (FR and NFR): 96.69%

Second Task (Eleven classes of NFR requirements): 62.33%

 

In the first task, using one-way ANOVA as the FS method improved on the results of Canedo et al. [12] and Quba et al. [15]. However, in the multiclass classification task (the second task), Canedo et al. [12] outperformed all other studies. This provides strong evidence for the feasibility of using one-way ANOVA as an FS method, especially with SVM, in the classification of software requirements for FR and NFR.

 

5.  Conclusion

This study aimed to implement the automatic classification of software requirements using a supervised learning approach with one-way ANOVA as an FS method. Since different experimental scenarios have been suggested for studying the effectiveness of one-way ANOVA in classifying software requirements, different ones were conducted to study the effectiveness of one-way ANOVA in classifying software requirements. The SVM achieved the best results with the PROMISE_exp dataset in both binary (FRs and non-NFRs) and multiclass classification (subcategories of NFRs) tasks, with 96.69% and 62.33%, respectively, as the F1 scores. Moreover, in binary classification, proving that one-way ANOVA with SVM improved on the results of previous studies.

In the future, we plan to apply one-way ANOVA as the FS method with different datasets. At a broader level, one can reduce the number of features after using another feature extraction method using one of the wrapper or embedded methods after implementing one-way ANOVA as a hybrid FS method.

 

 

References

[1]        J. Lilleberg, Y. Zhu, and Y. Zhang, “Support Vector Machines and Word2vec for Text Classification with Semantic Features,” in Proceedings of 2015 IEEE 14th International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2015, 2015.

[2]        D. Sarkar, Text Analytics with Python. 2019.

[3]        M. Lima, V. Valle, E. Costa, F. Lira, and B. Gadelha, “Software Engineering Repositories: Expanding the PROMISE Database,” in ACM International Conference Proceeding Series, 2019.

[4]        J. Zubcoff, I. Garrigós, S. Casteleyn, J. N. Mazón, J. A. Aguilar, and F. Gomariz-Castillo, “Evaluating Different i*-Based Approaches for Selecting Functional Requirements while Balancing and Optimizing Non-functional Requirements: A Controlled Experiment,” Inf. Softw. Technol., vol. 106, 2019.

[5]        Z. S. H. Abad, O. Karras, P. Ghazi, M. Glinz, G. Ruhe, and K. Schneider, “What Works Better? A Study of Classifying Requirements,” in Proceedings of the 2017 IEEE 25th International Requirements Engineering Conference, RE 2017, 2017.

[6]        R. Navarro-Almanza, R. Juurez-Ramirez, and G. Licea, “Towards supporting software engineering using deep learning: A case of software requirements classification,” in Proceedings of the 2017 5th International Conference in Software Engineering Research and Innovation, CONISOFT 2017, 2018.

[7]        Y. Yang and J. O. Pedersen, “A comparative study on feature selection in Text Categorization,” in Proceedings of the Fourteenth International Conference on Machine Learning (ICML ’97), 1997, pp. 412–420.

[8]        A. G. K. Janecek, W. N. Gansterer, M. A. Demel, and G. F. Ecker, “On the Relationship Between Feature Selection and Classification Accuracy,” in Proceedings of New Challenges for Feature Selection in Data Mining and Knowledge Discovery, 2008, vol. 4, pp. 90–105.

[9]        M. Alassaf and A. M. Qamar, “Improving sentiment analysis of Arabic tweets by one-way ANOVA,” J. King Saud Univ. – Comput. Inf. Sci., vol. 34, no. 6, 2022.

[10]      N. O. F. Elssied, O. Ibrahim, and A. H. Osman, “A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification,” Res. J. Appl. Sci. Eng. Technol., vol. 7, no. 3, pp. 625–638, 2014.

[11]      M. Binkhonain and L. Zhao, “A Review of Machine Learning Algorithms for Identification and Classification of Non-functional Requirements,” Expert Systems with Applications: X, vol. 1. 2019.

[12]      E. D. Canedo and B. C. Mendes, “Software Requirements Classification Using Machine Learning Algorithms,” Entropy, vol. 22, no. 9, 2020.

[13]      Z. Kurtanovic and W. Maalej, “Automatically Classifying Functional and Non-functional Requirements Using Supervised Machine Learning,” in Proceedings of the 2017 IEEE 25th International Requirements Engineering Conference, RE 2017, 2017.

[14]      R. Jindal, R. Malhotra, and A. Jain, “Automated Classification of Security Requirements,” in 2016 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2016, 2016.

[15]      G. Y. Quba, H. Al Qaisi, A. Althunibat, and S. Alzu’Bi, “Software Requirements Classification using Machine Learning Algorithms,” in 2021 International Conference on Information Technology, ICIT 2021 – Proceedings, 2021.

[16]      W. Zhang, Y. Yang, Q. Wang, and F. Shu, “An Empirical Study on Classification of Non-functional Requirements,” in SEKE 2011 – Proceedings of the 23rd International Conference on Software Engineering and Knowledge Engineering, 2011.

[17]      M. Riaz, J. King, J. Slankas, and L. Williams, “Hidden in Plain Sight: Automatically Identifying Security Requirements from Natural Language Artifacts,” in 2014 IEEE 22nd International Requirements Engineering Conference, RE 2014 – Proceedings, 2014.

[18]      M. Lu and P. Liang, “Automatic Classification of Non-functional Requirements from Augmented App User Reviews,” in ACM International Conference Proceeding Series, 2017, Part F128635.

[19]      L. Sthle and S. Wold, “Analysis of Variance (ANOVA),” Chemom. Intell. Lab. Syst., vol. 6, no. 4, pp. 259–272, 1989.

[20]      R. L. Wasserstein and N. A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” Am. Stat., vol. 70, no. 2, pp. 129–133, Apr. 2016.

[21]      A. Grünauer and M. Vincze, “Using Dimension Reduction to Improve the Classification of High-dimensional Data,” arXiv Prepr. arXiv1505.06907, 2015.

[22]      M. O. Arowolo, S. O. Abdulsalam, Y. K. Saheed, and M. D. Salawu, “A Feature Selection Based on One-Way Anova for Microarray Data Classification,” Al-Hikmah J. Pure Appl. Sci., vol. 3, no. 2016, pp. 30–35, 2018.

[23]      D. D. Lewis, “An Evaluation of Phrasal and Clustered Representations Task on a Text Categorization,” in Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 96), 1992, pp. 298–309.

[24]      J. Wainer and G. Cawley, “Nested Cross-validation when Selecting Classifiers is Overzealous for Most Practical Applications,” arXiv Prepr. arXiv1809.09446, pp. 1–9, 2018.

[25]      A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. 2019.

[1] http://promise.site.uottawa.ca/SERepository/

ver4 (1)

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00