與使用Python的txt文件中的列表相比，如何從csv文件中刪除行？

Question

我有一個保存在.txt文件中的12.000個詞典條目的列表（僅單詞，沒有其定義）。

我有一本完整的詞典，其中有62.000個條目（帶有其定義的單詞）存儲在.csv文件中。

我需要將.txt文件中的小列表與.csv文件中的大列表進行比較，並刪除包含未出現在小列表中的條目的行。 換句話說，我想將此詞典清除為僅12.000個條目。

.txt文件按以下逐行排序：

字1

WORD2

WORD3

.csv文件的排序如下：

ID （第1列） WORD （第2列）含義（第3列）

如何使用Python完成此操作？

Answer 1

以下內容無法很好地擴展，但應適用於指示的記錄數。

import csv

csv_in = csv.reader(open(path_to_file, 'r'))
csv_out = csv.writer(open(path_to_file2, 'w'))
use_words = open(path_to_file3, 'r').readlines()

lookup = dict([(word, None) for word in use_words])

for line in csv_in:
    if lookup.has_key(line[0]):
        csv_out.writerow(line)

csv_out.close()

Answer 2

到目前為止，很好的答案。 如果您想變得簡約...

import csv

lookup = set(l.strip().lower() for l in open(path_to_file3))
map(csv.writer(open(path_to_file2, 'w')).writerow, 
    (row for row in csv.reader(open(path_to_file)) 
    if row[1].lower() in lookup))

Answer 3

當前計算機鮮為人知的事實之一是，當您從文本文件中刪除一行並保存該文件時，大多數情況下，編輯器會這樣做：

將文件加載到內存
用所需的行寫一個臨時文件
關閉文件並將溫度移到原始位置

因此，您必須加載單詞表：

with open('wordlist.txt') as i:
    wordlist = set(word.strip() for word in i)  #  you said the file was small

然后打開輸入文件：

with open('input.csv') as i:
    with open('output.csv', 'w') as o:
        output = csv.writer(o)
        for line in csv.reader(i):  # iterate over the CSV line by line
            if line[1] not in wordlist:  # test the value at column 2, the word
                output.writerow(line) 

os.rename('input.csv', 'output.csv')

這未經測試，如果發現任何錯誤，現在就去做功課並在這里評論... :-)

Answer 4

我會為此使用熊貓。 數據集不大，因此您可以毫無問題地在內存中進行操作。

import pandas as pd

words = pd.read_csv('words.txt')
defs = pd.read_csv('defs.csv')
words.set_index(0, inplace=True)
defs.set_index('WORD', inplace=True)
new_defs = words.join(defs)
new_defs.to_csv('new_defs.csv')

您可能需要操縱new_defs使其看起來像您想要的那樣，但這就是要點。

與使用Python的txt文件中的列表相比，如何從csv文件中刪除行？

問題描述

4 個解決方案

解決方案1
1 2015-01-09 19:22:01

解決方案2
1 已采納 2015-01-09 19:37:47

解決方案3
0 2015-01-09 19:31:36

解決方案4
0 2015-01-09 20:32:36

與使用Python的txt文件中的列表相比，如何從csv文件中刪除行？

問題描述

4 個解決方案

解決方案1 1 2015-01-09 19:22:01

解決方案2 1 已采納 2015-01-09 19:37:47

解決方案3 0 2015-01-09 19:31:36

解決方案4 0 2015-01-09 20:32:36

解決方案1
1 2015-01-09 19:22:01

解決方案2
1 已采納 2015-01-09 19:37:47

解決方案3
0 2015-01-09 19:31:36

解決方案4
0 2015-01-09 20:32:36