Electron microscopy
Libraries used to Convert Incident Documents into Numerical Vectors
- Python for Integrated Circuits -
- An Online Book -
Python for Integrated Circuits                                                                                   http://www.globalsino.com/ICs/        

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix


Table 4316. Libraries used to convert incident documents into numerical vectors (Natural Language Processing (NLP)).
  spaCy Natural Language Toolkit (NLTK)
Toolkit NLP NLP
Use Simpler to use  
Differences Adopts an object-oriented approach As a string processing library
Is more efficient when working with words, e.g. identify keywords Performs better than spaCy when analyzing sentences [1]

Many features that can be used to pre-process text data – which comes with tokenization and lemmatization features which were used to transform the words in the incident database to their canonical form. E.g the words "run", "running", and "ran" would all be reverted to "run" [2]

Typical application

For keyword analysis in this research [2]













[1] Malhotra, A., 2018. Introduction to Libraries of NLP in Python - NLTK vs. spaCy. Retrieved from https://medium.com/@akankshamalhotra24/introduction-to-libraries-of-nlp-in-python-nltk-vs-spacy-42d7b2f128f2.
[2] Daniel Kurian, Fereshteh Sattari, Lianne Lefsrud, Yongsheng Ma, Using machine learning and keyword analysis to analyze incidents and reduce risk in oil sands operations, Safety Science, 130(2020), 104873.