Rake_NLTK - Python for Integrated Circuits - - An Online Book - |
||||||||
Python for Integrated Circuits http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= Rapid Automatic Keyword Extraction (RAKE) is a well-known, domain-independent keyword extraction algorithm, in Natural Language Processing (page4315), that finds the most relevant words or phrases in a piece of text using a set of stopwords and phrase delimiters. RAKE is an Individual document-oriented dynamic Information retrieval method. Rake NLTK is an expanded version of RAKE that is supported by NLTK. The steps for RAKE are: One of the critical points made by the creator of RAKE is that keywords frequently contain multiple words but rarely contain punctuation (e.g. period, comma, apostrophe, quotation, question, exclamation, brackets, braces, parenthesis, dash, hyphen, ellipsis, colon, semicolon), stop words (e.g. the, is, and, not, that, there, are, many, that, can, you, with, one, of, those), or other words with minimum lexical meaning.
Therefore, the "content words" is obtained by the equation below: The concept of RAKE is built on three matrices: ============================================ Rapid Automatic Keyword Extraction (RAKE). In the program, Word Frequency is the occurrence of the same word, Degree of Word is the sum of the accordance of a word together with other words in the same phrases, and Degree Score is equal to (Degree of Word)/(Degree Score). Code:
|
||||||||
================================================================================= | ||||||||
|
||||||||