使用python查找文本文件中标签的频率

Question

我有一个标签文件，其中包含我需要在 mobydick 文件中找出其频率的单词，基本上我必须从标签中提取一个单词并在 mobydick 中搜索单词并打印单词及其频率，我已经完成了以下操作程序，但出现错误，因为我能够从标签中提取单词，但无法在 mobydick 中进行检查。 我附上了颂歌和错误。 如果有人可以提供帮助，那将有很大帮助。 谢谢。

import pandas as pd
import numpy as np
import nltk, re, pprint
import string

from collections import Counter
from nltk.tokenize import sent_tokenize,word_tokenize
from urllib import request

with open('tags.txt','r') as f:

    for line in f:
        for word in line.split():
            if word in open('MobyDick.txt').read():
                c=Counter(word)
            print(c)

错误是

UnicodeDecodeError Traceback（最近一次调用最后一次） in () 9 for line in f: 10 for word in line.split(): ---> 11 if word in open('MobyDick.txt').read(): 12 c =计数器（字）13

C:\\Users\\Pratik\\Anaconda3\\lib\\encodings\\cp1252.py in decode(self, input, final) 21 class IncrementalDecoder(codecs.IncrementalDecoder): 22 def decode(self, input, final=False): --- > 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] 24 25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' 编解码器无法解码位置 7237 中的字节 0x9d：字符映射到

Answer 1

似乎 open 函数无法解码您的文件。 打开文件时尝试指定编解码器，否则文件将使用系统默认编解码器打开，这取决于操作系统。 例如

if word in open('MobyDick.txt', encoding='utf8').read():
    ...

使用python查找文本文件中标签的频率

问题描述

1 个解决方案

解决方案1
0 2017-10-20 14:36:49

使用python查找文本文件中标签的频率

问题描述

1 个解决方案

解决方案1 0 2017-10-20 14:36:49

解决方案1
0 2017-10-20 14:36:49