简体   繁体   English

在另一个文件中搜索文件的行并在python中打印适当的行

[英]Searching rows of a file in another file and printing appropriate rows in python

I have a csv file like this: (no headers)我有一个这样的 csv 文件:(没有标题)

aaa,1,2,3,4,5  
bbb,2,3,4,5,6
ccc,3,5,7,8,5
ddd,4,6,5,8,9

I want to search another csv file: (no headers)我想搜索另一个 csv 文件:(没有标题)

bbb,1,2,3,4,5,,6,4,7
kkk,2,3,4,5,6,5,4,5,6
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4
sss,1,2,3,4,5,3,5,3,5

and print rows in the second file(based on matching of the first columns) that exist in the first file.并打印存在于第一个文件中的第二个文件中的行(基于第一列的匹配)。 So results will be:所以结果将是:

bbb,1,2,3,4,5,,6,4,7
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4 

I have following code, but it does not print anything:我有以下代码,但它不打印任何内容:

labels = []
with open("csv1.csv", "r") as f:

    f.readline()
    for line in f:
        labels.append((line.strip("\n")))

with open("csv2.csv", "r") as f:

    f.readline()
    for line in f:
        if (line.split(",")[1]) in labels:
            print (line)

If possible, could you tell me how to do this, please ?如果可能的话,你能告诉我怎么做吗? What is wrong with my code ?我的代码有什么问题? Thanks in advance !提前致谢 !

This is one solution, although you may also look into csv-specific tools and pandas as suggested:这是一种解决方案,尽管您也可以按照建议查看特定于 csv 的工具和 Pandas:

labels = []
with open("csv1.csv", "r") as f:
    lines = f.readlines()
    for line in lines:
        labels.append(line.split(',')[0])

with open("csv2.csv", "r") as f:
    lines = f.readlines()

with open("csv_out.csv", "w") as out:
    for line in lines:
        temp = line.split(',')
        if any(temp[0].startswith(x) for x in labels):
            out.write((',').join(temp))

The program first collects only labels from csv1.csv - note that you used readline , where the program seems to expected all the lines from the file read at once.该程序首先仅从csv1.csv收集标签 - 请注意,您使用了readline ,该程序似乎希望一次读取文件中的所有行。 One way to do it is by using readlines .一种方法是使用readlines The program also has to collect the lines from readlines - here it stores them in a list named lines .该程序还必须从readlines收集行 - 在这里它将它们存储在名为lines的列表中。 To collect the labels, the program loops through each line, splits it by a , and appends the first element to the array with labels, labels .为了收集标签,程序循环遍历每一行,用 a 分割,然后将第一个元素附加到带有标签labels的数组中。

In the second part, the program reads all the lines from csv2.csv while also opening the file for writing the output, csv.out .在第二部分,程序从csv2.csv读取所有行,同时打开文件以写入输出csv.out It processes the lines from csv2.csv line by line while at the same time writing the target files to the output file.csv2.csv处理来自csv2.csv行,同时将目标文件写入输出文件。

To do that, the program again splits each line by , and looks if the label from csv2 is found in the labels array.为此,程序再次将每一行按 分割,并查看是否在labels数组中找到了来自csv2labels If it is, that line is written to csv_out.csv .如果是,该行将写入csv_out.csv

  • Try using pandas , its a very effective way to read csv files into a data structure called dataframes.尝试使用pandas ,这是将 csv 文件读入称为数据帧的数据结构的一种非常有效的方法。

EDIT编辑

labels = []
with open("csv1.csv", "r") as f:

    f.readline()
    for line in f:
        labels.append((line.split(',')[0])

with open("csv2.csv", "r") as f:

    f.readline()
    for line in f:
        if (line.split(",")[0]) in labels:
            print (line)

I it so that labels only contains the first part of the string so ['aaa','bbb', etc]我这样标签只包含字符串的第一部分所以['aaa','bbb', etc]

Then you want to check if line.split(",")[0] is in labels然后你想检查line.split(",")[0]是否在标签中

Since you want to only match it based on the first column, you should use split and then get the first item from the split which is at index 0.由于您只想根据第一列匹配它,因此您应该使用 split 然后从位于索引 0 的 split 中获取第一个项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM