Python-僅在特定情況下讀取文件的最后一行

Question

我正在嘗試使用Python處理一些推文，並且試圖對7種不同推文中包含的最受歡迎詞進行字數統計。 我已經設置好文件，每個tweet在其自己的行上都是一個JSON對象，當我嘗試使用以下命令打印每個tweet時，它可以正常工作：

with open(fname, 'r') as f:
for line in f:
    tweet = json.loads(line) # load it as Python dict
    print(json.dumps(tweet, indent=4))

但是，當我嘗試在字數統計中執行類似的操作時，它要么讀取文件的最后一行7次，要么僅讀取文件的最后一行一次。 我正在使用以下代碼，從結果中刪除停用詞：

with open(fname, 'r', encoding='utf8') as f:
count_all = Counter()
# Create a list with all the terms
terms_stop = [term for term in tokens if term not in stop]
for line in f:
    # Update the counter
    count_all.update(terms_stop)
# Print the first 5 most frequent words
print(count_all.most_common(5))

上面的代碼從最后一條推文中產生5個隨機單詞，每個單詞的計數為7-這意味着它實際上讀了最后一條推文7次，而不是一次讀取7條推文中的每條。

以下代碼旨在查看哪些單詞最常見地組合在一起。 它從最后一條推文中產生5個隨機分組的單詞，計數僅為1，這表明它僅讀取了最后一條推文（一次），而沒有其他推文。

with open(fname, 'r', encoding='utf8') as f:
count_all = Counter()
# Create a list with all the terms
terms_stop = [term for term in tokens if term not in stop]
# Import Bigrams to group words together
terms_bigram = bigrams(terms_stop)
for line in f:
    # Update the counter
    count_all.update(terms_bigram)
# Print the first 5 most frequent words
print(count_all.most_common(5))

我的json文件的格式如下：

{"created_at":"Tue Oct 25 11:24:54 +0000 2016","id":4444444444,.....}
{"created_at":..... }
{etc}

幫助將不勝感激！ 首先十分感謝。

更新：不知道我怎么想念它，但是感謝大家的幫助！ 我忘了在我的for循環中包含“行”。 這是工作代碼：

with open(fname, 'r', encoding='utf8') as f:
count_all = Counter()
for line in f:
    tweet = json.loads(line)
    tokens = preprocess(tweet['text'])
    # Create a list with all the terms
    terms_stop = [term for term in tokens if term not in stop]
    # Update the counter
    count_all.update(terms_stop)
# Print the first 5 most frequent words
print(count_all.most_common(5))

我只需要將分詞器與字數結合起來。

Answer 1

也許我缺少了一些東西，但是您永遠不會在for循環中使用line：

for line in f:
    # Update the counter
    count_all.update(terms_bigram)

因此，您只需要遍歷所有行就可以對每一行執行相同的操作。

Answer 2

試試這個來讀取文件：

with open(fname) as d:
    tweet = json.load(d)

如果這不起作用，請發布有關文件數據格式的更多信息。

新更新：

with open(fname) as d:
    data = d.readlines()

tweet = [json.loads(x) for x in data]

這將為您提供字典列表（json格式）

Python-僅在特定情況下讀取文件的最后一行

問題描述

2 個解決方案

解決方案1
1 已采納 2016-10-28 10:48:51

解決方案2
0 2016-10-28 10:36:55

Python-僅在特定情況下讀取文件的最后一行

問題描述

2 個解決方案

解決方案1 1 已采納 2016-10-28 10:48:51

解決方案2 0 2016-10-28 10:36:55

解決方案1
1 已采納 2016-10-28 10:48:51

解決方案2
0 2016-10-28 10:36:55