Text/Keyword Classification/Sort/Prediction, Train/Test e.g. Youtube Spam
- Integrated Circuits - - An Online Book - |
||||||||
Integrated Circuits http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= In the keyword analysis with supervised machine learning approach by Kurian et al., [1] a total of 15,000 incidents were manually classified: descriptive labels, actual and potential risk scores, and consequence labels (environment, finance, health/safety, and reputation) were applied to each incident. In their keyword analysis, supervised machine learning was used with the Linear Support Vector Classifier (Linear SVC) to predict labels for incidents since it provides the highest accuracy. The incident reports were then divided into training and test data, and the machine learning algorithm used the training data to predict labels for the test data. The result of this research was a machine learning algorithm that could apply labels to incidents with 75–90% accuracy (depending on the label), and the outputs were used to develop risk matrices and to analyze trends in incidents. Such machine learning can be used to remove human bias, and this method allowed for consistent reporting of incidents. However, some incident reports lacked the detail required for classification, therefore it was impossible to completely remove bias as using a supervised learning model implies manual training. Additional keyword analysis can be applied to increase the accuracy of machine learning classification. This ML research provides significant changes to the current system of incident reporting. The objectives of the research with keyword analysis with supervised machine learning approach presented by Kurian et al. [1], with the methodology as shown in Figure 4511a, were to:
Figure 4511b shows the details of the steps involved in the methodology of the supervised machine learning approach. The reports with 15,000 incidents were used to train a machine learning algorithm to predict class labels, in conjunction with keyword analysis, for new incident reports. In the multi-step process in 2.2, machine learning and keyword analysis were applied to the incidents reports. A supervised machine learning algorithm (page4323) had been used to classify incident reports in this step.
After the accuracies from the machine learning classification was determined, Natural Language Processing (NLP) was used to analyze keywords. (page4323) Keyword analysis can be completed by lemmatizing all the words found in the incident database, [1] e.g. "run" = "running" = "ran" are all reverted to "run", A counter can then used to identify and tally the lemmatized words, and these words were then arranged from most frequent to least frequent. The keywords that could be used to classify incidents were selected to include in the customized library (stop words, punctuation, names of individuals, etc. were re-moved). A customized library can be created with two variables in the analysis: The labels and keywords stored in the customized library were then matched to statements that could be used to analyze and evaluate events. The combination of using both machine learning and a "manual" keyword approach is to increase accuracy and ensure that the generated statements could accurately describe any incident. To some extent, the keyword analysis was also used as a buffer to compensate for misclassification by the machine learning algorithm. ============================================ Prediction of Youtube spam: code:
[1] Daniel Kurian, Fereshteh Sattari, Lianne Lefsrud, Yongsheng Ma, Using machine learning and keyword analysis to analyze incidents and reduce risk in oil sands operations, Safety Science, 130(2020), 104873.
|
||||||||
================================================================================= | ||||||||
|
||||||||