to find the frequency of the tags in the text file by using python

Question

I have a tag file containing the words whose frequency I need to find out in the mobydick file, basically I have to extract a word from the tags and search the word in the mobydick and print the word and its frequency, I have done the below program , but I am getting a error , as I am able to extract the word from the tags but not able to check the same in mobydick . I have attached the ode and the error. It will of great help if someone can assist. Thank you.

import pandas as pd
import numpy as np
import nltk, re, pprint
import string

from collections import Counter
from nltk.tokenize import sent_tokenize,word_tokenize
from urllib import request

with open('tags.txt','r') as f:

    for line in f:
        for word in line.split():
            if word in open('MobyDick.txt').read():
                c=Counter(word)
            print(c)

and the Error is

UnicodeDecodeError Traceback (most recent call last) in () 9 for line in f: 10 for word in line.split(): ---> 11 if word in open('MobyDick.txt').read(): 12 c=Counter(word) 13

C:\\Users\\Pratik\\Anaconda3\\lib\\encodings\\cp1252.py in decode(self, input, final) 21 class IncrementalDecoder(codecs.IncrementalDecoder): 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] 24 25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7237: character maps to

Answer 1

It seems the open function failed to decode your file. Try to specify the codec when you open your file otherwise the file will be opened with your system default codec, which is OS dependent. eg

if word in open('MobyDick.txt', encoding='utf8').read():
    ...

to find the frequency of the tags in the text file by using python

Question

1 answers

solution1
0 2017-10-20 14:36:49

to find the frequency of the tags in the text file by using python

Question

1 answers

solution1 0 2017-10-20 14:36:49

solution1
0 2017-10-20 14:36:49