Python-仅在特定情况下读取文件的最后一行

Question

I'm trying to process some tweets using Python, and I'm trying to do a word count of the most popular words contained in 7 different tweets. 我正在尝试使用Python处理一些推文，并且试图对7种不同推文中包含的最受欢迎词进行字数统计。 I have my file set up, each tweet is a JSON object on its own line, and when I try to print out each tweet using the following, it works perfectly: 我已经设置好文件，每个tweet在其自己的行上都是一个JSON对象，当我尝试使用以下命令打印每个tweet时，它可以正常工作：

with open(fname, 'r') as f:
for line in f:
    tweet = json.loads(line) # load it as Python dict
    print(json.dumps(tweet, indent=4))

However, when I am trying to do something similar in my word count, it either reads the last line of the file 7 times, or just the last line of the file once. 但是，当我尝试在字数统计中执行类似的操作时，它要么读取文件的最后一行7次，要么仅读取文件的最后一行一次。 I am using the following code, removing stopwords from the results: 我正在使用以下代码，从结果中删除停用词：

with open(fname, 'r', encoding='utf8') as f:
count_all = Counter()
# Create a list with all the terms
terms_stop = [term for term in tokens if term not in stop]
for line in f:
    # Update the counter
    count_all.update(terms_stop)
# Print the first 5 most frequent words
print(count_all.most_common(5))

The above produces 5 random words from the last tweet, and the count of each one is at 7 - meaning that it essentially read the last tweet 7 times instead of reading each of the 7 tweets once. 上面的代码从最后一条推文中产生5个随机单词，每个单词的计数为7-这意味着它实际上读了最后一条推文7次，而不是一次读取7条推文中的每条。

The following code is meant to see which words are most commonly grouped together. 以下代码旨在查看哪些单词最常见地组合在一起。 It produces 5 randomly grouped words from the last tweet, with the count at just 1, which signifies that it only read the last tweet (once) and none of the other tweets. 它从最后一条推文中产生5个随机分组的单词，计数仅为1，这表明它仅读取了最后一条推文（一次），而没有其他推文。

with open(fname, 'r', encoding='utf8') as f:
count_all = Counter()
# Create a list with all the terms
terms_stop = [term for term in tokens if term not in stop]
# Import Bigrams to group words together
terms_bigram = bigrams(terms_stop)
for line in f:
    # Update the counter
    count_all.update(terms_bigram)
# Print the first 5 most frequent words
print(count_all.most_common(5))

The format of my json file is as follows: 我的json文件的格式如下：

{"created_at":"Tue Oct 25 11:24:54 +0000 2016","id":4444444444,.....}
{"created_at":..... }
{etc}

Help would be most appreciated! 帮助将不胜感激！ Thanks very much in advance. 首先十分感谢。

UPDATE: Don't know how I missed it, but thanks for the help everyone! 更新：不知道我怎么想念它，但是感谢大家的帮助！ I forgot to include 'line' in my for loop. 我忘了在我的for循环中包含“行”。 Here is the working code: 这是工作代码：

with open(fname, 'r', encoding='utf8') as f:
count_all = Counter()
for line in f:
    tweet = json.loads(line)
    tokens = preprocess(tweet['text'])
    # Create a list with all the terms
    terms_stop = [term for term in tokens if term not in stop]
    # Update the counter
    count_all.update(terms_stop)
# Print the first 5 most frequent words
print(count_all.most_common(5))

I just had to combine the tokenizer with the word count. 我只需要将分词器与字数结合起来。

Answer 1

Perhaps I am missing something but you never use line in the for-loop: 也许我缺少了一些东西，但是您永远不会在for循环中使用line：

for line in f:
    # Update the counter
    count_all.update(terms_bigram)

so you are just looping over the lines doing the same thing for each line. 因此，您只需要遍历所有行就可以对每一行执行相同的操作。

Answer 2

Try this to read the file: 试试这个来读取文件：

with open(fname) as d:
    tweet = json.load(d)

If this dont work, post more info about the file data format. 如果这不起作用，请发布有关文件数据格式的更多信息。

New update: 新更新：

with open(fname) as d:
    data = d.readlines()

tweet = [json.loads(x) for x in data]

This will give you a list of dictionaries (json format) 这将为您提供字典列表（json格式）

Python-仅在特定情况下读取文件的最后一行

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-10-28 10:48:51

解决方案2
0 2016-10-28 10:36:55

Python-仅在特定情况下读取文件的最后一行

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-10-28 10:48:51

解决方案2 0 2016-10-28 10:36:55

解决方案1
1 已采纳 2016-10-28 10:48:51

解决方案2
0 2016-10-28 10:36:55