
上QQ阅读APP看书,第一时间看更新
How to do it...
- Initialize the stemming process with a new Python file:
from nltk.stem.porter import PorterStemmer from nltk.stem.lancaster import LancasterStemmer from nltk.stem.snowball import SnowballStemmer
- Let's describe some words to consider, as follows:
words = ['ability', 'baby', 'college', 'playing', 'is', 'dream', 'election', 'beaches', 'image', 'group', 'happy']
- Identify a group of stemmers to be used:
stemmers = ['PORTER', 'LANCASTER', 'SNOWBALL']
- Initialize the necessary tasks for the chosen stemmers:
stem_porter = PorterStemmer() stem_lancaster = LancasterStemmer() stem_snowball = SnowballStemmer('english')
- Format a table to print the results:
formatted_row = '{:>16}' * (len(stemmers) + 1) print 'n', formatted_row.format('WORD', *stemmers), 'n'
- Repeatedly check the list of words and arrange them using chosen stemmers:
for word in words:
stem_words = [stem_porter.stem(word), stem_lancaster.stem(word), stem_snowball.stem(word)] print formatted_row.format(word, *stem_words)
The result obtained from the stemming process is shown in the following screenshot:
