Search

Natural Language Processing

to be continued...

# regex for removing punctuation!
import re
# nltk preprocessing magic
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
# grabbing a part of speech function:
from part_of_speech import get_part_of_speech

text = "So many squids are jumping out of suitcases these days that you can barely go anywhere without seeing one burst forth from a tightly packed valise. I went to the dentist the other day, and sure enough I saw an angry one jump out of my dentist's bag within minutes of arriving. She hardly even noticed."

cleaned = re.sub('\W+', ' ', text)
tokenized = word_tokenize(cleaned)

stemmer = PorterStemmer()
stemmed = [stemmer.stem(token) for token in tokenized]

## -- CHANGE these -- ##
lemmatizer = None
lemmatized = []

print("Stemmed text:")
print(stemmed)
print("\nLemmatized text:")
print(lemmatized)

Recent Posts

See All

Dijkstra shortest path algorithm

Word ladder game (change only one letter to go from Fool to Sage): Fool, Pool, Poll, Pole, Pale, Sale, Sage. How? Dijkstra shortest path algorithm

Deep Learning for Algorithmic Trading

Finance is highly nonlinear and sometimes stock price data can even seem completely random. Machine learning and Deep Learning have found their place in the financial institutions for their power in p

Statistical Arbitrage Trading Pairs

What are z score values? A Z score is the value of a supposedly normal random variable when we subtract the mean and divide by the standard deviation, thus scaling it to the standard normal distributi

©2020 by Arturo Devesa.