简体   繁体   中英

to find the frequency of the tags in the text file by using python

I have a tag file containing the words whose frequency I need to find out in the mobydick file, basically I have to extract a word from the tags and search the word in the mobydick and print the word and its frequency, I have done the below program , but I am getting a error , as I am able to extract the word from the tags but not able to check the same in mobydick . I have attached the ode and the error. It will of great help if someone can assist. Thank you.

import pandas as pd
import numpy as np
import nltk, re, pprint
import string

from collections import Counter
from nltk.tokenize import sent_tokenize,word_tokenize
from urllib import request

with open('tags.txt','r') as f:

    for line in f:
        for word in line.split():
            if word in open('MobyDick.txt').read():
                c=Counter(word)
            print(c)

and the Error is

UnicodeDecodeError Traceback (most recent call last) in () 9 for line in f: 10 for word in line.split(): ---> 11 if word in open('MobyDick.txt').read(): 12 c=Counter(word) 13

C:\\Users\\Pratik\\Anaconda3\\lib\\encodings\\cp1252.py in decode(self, input, final) 21 class IncrementalDecoder(codecs.IncrementalDecoder): 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] 24 25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7237: character maps to

It seems the open function failed to decode your file. Try to specify the codec when you open your file otherwise the file will be opened with your system default codec, which is OS dependent. eg

if word in open('MobyDick.txt', encoding='utf8').read():
    ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM