简体   繁体   English

比较相同 csv 文件中没有 header 的两列和使用 ZA7F5F354233B58682 的 output 匹配值。

[英]Compare two columns with no header of same csv file and output matching values using Python 3.8

I have a CSV file with two columns with no header.我有一个 CSV 文件,其中两列没有 header。 I want to compare these two columns and find out if column one matches with the list of column 2 and extract the matching values into a new CSV file (output.csv), and delete the whole row if column 2 does not have matching values with column 1. For example,我想比较这两列并找出第一列是否与第 2 列的列表匹配,并将匹配值提取到新的 CSV 文件(output.csv)中,如果第 2 列没有匹配值,则删除整行第 1 列。例如,

Input.csv:输入.csv:

1,"[0, 10, 12, 13, 16, 25, 32, 35, 60, 86, 98, 108, 168, 172, 222, 251, 275, 278, 325, 365]"
60,"[12014, 25665, 28278]"
86,"[0, 6, 7, 10, 12, 25, 76, 156, 174, 176, 181, 188, 365, 392, 438]"
108,"[1, 16, 21, 32, 35, 61, 81, 83, 95, 138, 153, 204, 222]"
438,"[30549]"
28278,"[60, 120, 140, 505, 3939, 4034, 7213, 7308, 8784, 14126, 14147, 15197, 16842, 20022, 28229]"

output.csv: output.csv:

1,"[60, 108]"
60,"[28278]"
108,"[1]"
28278,"[60]"

I have tried this code,我试过这段代码,

    import csv

    with open('input.csv', 'r') as csvfile:
        csvreader = csv.reader(csvfile, delimiter='\t')

    nodes_in_1 = set()
    nodes_in_2 = set()

    for line in csvreader:
        nodes_in_1.add(line[0])
        nodes_in_2.add(line[1])

    nodes_in_both = nodes_in_1.intersection(nodes_in_2)

    with open('output.csv', 'w') as f_out:
        f_out.write(nodes_in_both + '\n')

I am a beginner.我是初学者。 Thank you for the help.感谢您的帮助。

This can indeed be done in pandas:这确实可以在 pandas 中完成:

import pandas as pd
from ast import literal_eval
df = pd.read_csv("test.csv",header=None, converters={1: literal_eval}) # load csv, use literal_eval to load the lists as lists in stead of strings
df[1] = df[1].apply(lambda x: [i for i in x if i in df[0].tolist()]) # keep only the values in the lists in the second column that match with a value in the first column
df = df[df[1].map(len) > 0] # drop rows with empty lists  
df.to_csv('output.csv', index=False, header=None) # write df to csv

Output df: Output df:

|    |     0 | 1             |
|---:|------:|:--------------|
|  0 |     1 | [60, 86, 108] |
|  1 |    60 | [28278]       |
|  2 |    86 | [438]         |
|  3 |   108 | [1]           |
|  5 | 28278 | [60]          |
import re
def run():
    c1,c3 = [],[]
    with open('stack1.txt') as f:
        # c1 holds 1st column values
        for line in f:
            c1.append(line.split(',')[0].replace(' ',''))
        f.seek(0)
        for line in f:
            cx = line.split(',')[0].replace(' ','')
            # get list stored in column 2
            c2 = re.search('\[.*\]', line).group()[1:-1].replace(' ','').split(',')
            # find elements common with c1 (first column)
            c2 = [i for i in c2 if i in c1] 
            if c2:
                c3.append(cx + ',"[{}]"'.format(','.join(c2)))
    with open('stack1.out','w') as f:
        f.write('\n'.join(c3))
if __name__ == '__main__':
    run()

Above code does what you want to do.上面的代码做了你想做的事。 I hope I am not abusing your class homework;-)我希望我没有滥用你的 class 作业;-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 3.8 将一个 CSV 文件中的一列(向量)与另一个 CSV 文件中的两列(向量和数组)进行比较 - Compare one column (vector) from one CSV file with two columns (vector and array) from another CSV file using Python 3.8 使用 python 中的一个 csv 文件比较一列中的两个数据 - compare two data in a columns using one csv file in python 使用Python比较CSV文件的列 - Compare columns of a CSV file using Python 如何将csv中的两列与之前在python中声明的两个值进行比较? - How to compare two columns in csv with two values declared before in python? Python-比较来自两个不同csv的两列中的相似值 - Python - Compare similar values in two columns from two different csv 如何使用Python 3在两列中将值写入CSV文件? - How to write values to CSV file in two columns using Python 3? 比较python中的两个csv文件和输出特定列 - Compare two csv files and output specific columns in python 比较两个 csv 文件并将匹配的条目写入第三个文件 python - Compare two csv files and write the matching entries in third file python 使用python在两个csv文件之间匹配列值时输出错误 - Wrong output in matching a column values between two csv files using python 比较两个csv文件的多列并将输出保存为匹配/不匹配的新csv文件 - Comparing multiple columns of two csv files and save output as matching/not matching in new csv file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM