简体   繁体   中英

How can i get all 3 grams from a line fetched from a text file in python?

I fetched a line from a text file and the result it generates three grams of a line but at the end of line its output is two gram. eg input line is cswisceduwwt The output is

csw
swi
wis
isc
sce
ced
edu
dup
upa
par
ara
rad
ady
dyn
yn

In the end of line, its generating 2 grams(2 characters) .The last gram is "yn" and i think its adding space. I don't need the "yn" How can I remove the last gram having 2 characters from each line? The code is given below

def extract_n_grams(line):
        ngram = ngrams(line, 3)
        for item in ngram:
           result=item[0]+item[1]+item[2]
           print(result)

with open('C:/Users/Dania/Desktop/MS 2nd sem/preprocessed.txt') as corpus:
    for line in corpus:
        extract_n_grams(line)

It was showing last two characters because it was including space as its last(3rd) character so i removed space in the the end of line by using this statement

for line in corpus:
        rem_line=line.rstrip('\n')  #####removes space at the end of line
        extract_n_grams(rem_line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM