简体   繁体   English

与使用Python的txt文件中的列表相比,如何从csv文件中删除行?

[英]How to remove rows from a csv file when compared to a list in a txt file using Python?

I have a list of 12.000 dictionary entries (the words only, without their definitions) stored in a .txt file. 我有一个保存在.txt文件中的12.000个词典条目的列表(仅单词,没有其定义)。

I have a complete dictionary with 62.000 entries (the words with their definitions) stored in .csv file. 我有一本完整的词典,其中有62.000个条目(带有其定义的单词)存储在.csv文件中。

I need to compare the small list in the .txt file with the larger list in the .csv file and delete the rows containing the entries that doesn't appear on the smaller list. 我需要将.txt文件中的小列表与.csv文件中的大列表进行比较,并删除包含未出现在小列表中的条目的行 In other words, I want to purge this dictionary to only 12.000 entries. 换句话说,我想将此词典清除为仅12.000个条目。

The .txt file is ordered in separate lines like this, line by line: .txt文件按以下逐行排序:

word1 字1

word2 WORD2

word3 WORD3

The .csv file is ordered like this: .csv文件的排序如下:

ID (column 1) WORD (column 2) MEANING (column 3) ID (第1列) WORD (第2列) 含义 (第3列)

How do I accomplish this using Python? 如何使用Python完成此操作?

The following will not scale well, but should work for the number of records indicated. 以下内容无法很好地扩展,但应适用于指示的记录数。

import csv

csv_in = csv.reader(open(path_to_file, 'r'))
csv_out = csv.writer(open(path_to_file2, 'w'))
use_words = open(path_to_file3, 'r').readlines()

lookup = dict([(word, None) for word in use_words])

for line in csv_in:
    if lookup.has_key(line[0]):
        csv_out.writerow(line)

csv_out.close()

Good answers so far. 到目前为止,很好的答案。 If you want to get minimalistic... 如果您想变得简约...

import csv

lookup = set(l.strip().lower() for l in open(path_to_file3))
map(csv.writer(open(path_to_file2, 'w')).writerow, 
    (row for row in csv.reader(open(path_to_file)) 
    if row[1].lower() in lookup))

One of the least known facts of current computers is that when you delete a line from a text file and save the file, most of the time the editor does this: 当前计算机鲜为人知的事实之一是,当您从文本文件中删除一行并保存该文件时,大多数情况下,编辑器会这样做:

  1. load the file into memory 将文件加载到内存
  2. write a temporary file with the rows you want 用所需的行写一个临时文件
  3. close the files and move the temp over the original 关闭文件并将温度移到原始位置

So you have to load your wordlist: 因此,您必须加载单词表:

with open('wordlist.txt') as i:
    wordlist = set(word.strip() for word in i)  #  you said the file was small

Then you open the input file: 然后打开输入文件:

with open('input.csv') as i:
    with open('output.csv', 'w') as o:
        output = csv.writer(o)
        for line in csv.reader(i):  # iterate over the CSV line by line
            if line[1] not in wordlist:  # test the value at column 2, the word
                output.writerow(line) 

os.rename('input.csv', 'output.csv')

This is untested, now go do your homework and comment here if you find any bug... :-) 这未经测试,如果发现任何错误,现在就去做功课并在这里评论... :-)

i would use pandas for this. 我会为此使用熊猫。 the data set's not large, so you can do it in memory with no problem. 数据集不大,因此您可以毫无问题地在内存中进行操作。

import pandas as pd

words = pd.read_csv('words.txt')
defs = pd.read_csv('defs.csv')
words.set_index(0, inplace=True)
defs.set_index('WORD', inplace=True)
new_defs = words.join(defs)
new_defs.to_csv('new_defs.csv')

you might need to manipulate new_defs to make it look like you want it to, but that's the gist of it. 您可能需要操纵new_defs使其看起来像您想要的那样,但这就是要点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM