比较相同 csv 文件中没有 header 的两列和使用 ZA7F5F354233B58682 的 output 匹配值。

Question

I have a CSV file with two columns with no header.我有一个 CSV 文件，其中两列没有 header。 I want to compare these two columns and find out if column one matches with the list of column 2 and extract the matching values into a new CSV file (output.csv), and delete the whole row if column 2 does not have matching values with column 1. For example,我想比较这两列并找出第一列是否与第 2 列的列表匹配，并将匹配值提取到新的 CSV 文件（output.csv）中，如果第 2 列没有匹配值，则删除整行第 1 列。例如，

Input.csv:输入.csv：

1,"[0, 10, 12, 13, 16, 25, 32, 35, 60, 86, 98, 108, 168, 172, 222, 251, 275, 278, 325, 365]"
60,"[12014, 25665, 28278]"
86,"[0, 6, 7, 10, 12, 25, 76, 156, 174, 176, 181, 188, 365, 392, 438]"
108,"[1, 16, 21, 32, 35, 61, 81, 83, 95, 138, 153, 204, 222]"
438,"[30549]"
28278,"[60, 120, 140, 505, 3939, 4034, 7213, 7308, 8784, 14126, 14147, 15197, 16842, 20022, 28229]"

output.csv: output.csv：

1,"[60, 108]"
60,"[28278]"
108,"[1]"
28278,"[60]"

I have tried this code,我试过这段代码，

    import csv

    with open('input.csv', 'r') as csvfile:
        csvreader = csv.reader(csvfile, delimiter='\t')

    nodes_in_1 = set()
    nodes_in_2 = set()

    for line in csvreader:
        nodes_in_1.add(line[0])
        nodes_in_2.add(line[1])

    nodes_in_both = nodes_in_1.intersection(nodes_in_2)

    with open('output.csv', 'w') as f_out:
        f_out.write(nodes_in_both + '\n')

I am a beginner.我是初学者。 Thank you for the help.感谢您的帮助。

Answer 1

This can indeed be done in pandas:这确实可以在 pandas 中完成：

import pandas as pd
from ast import literal_eval
df = pd.read_csv("test.csv",header=None, converters={1: literal_eval}) # load csv, use literal_eval to load the lists as lists in stead of strings
df[1] = df[1].apply(lambda x: [i for i in x if i in df[0].tolist()]) # keep only the values in the lists in the second column that match with a value in the first column
df = df[df[1].map(len) > 0] # drop rows with empty lists  
df.to_csv('output.csv', index=False, header=None) # write df to csv

Output df: Output df：

|    |     0 | 1             |
|---:|------:|:--------------|
|  0 |     1 | [60, 86, 108] |
|  1 |    60 | [28278]       |
|  2 |    86 | [438]         |
|  3 |   108 | [1]           |
|  5 | 28278 | [60]          |

Answer 2

import re
def run():
    c1,c3 = [],[]
    with open('stack1.txt') as f:
        # c1 holds 1st column values
        for line in f:
            c1.append(line.split(',')[0].replace(' ',''))
        f.seek(0)
        for line in f:
            cx = line.split(',')[0].replace(' ','')
            # get list stored in column 2
            c2 = re.search('\[.*\]', line).group()[1:-1].replace(' ','').split(',')
            # find elements common with c1 (first column)
            c2 = [i for i in c2 if i in c1] 
            if c2:
                c3.append(cx + ',"[{}]"'.format(','.join(c2)))
    with open('stack1.out','w') as f:
        f.write('\n'.join(c3))
if __name__ == '__main__':
    run()

Above code does what you want to do.上面的代码做了你想做的事。 I hope I am not abusing your class homework;-)我希望我没有滥用你的 class 作业；-)

比较相同 csv 文件中没有 header 的两列和使用 ZA7F5F354233B58682 的 output 匹配值。

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-22 21:39:40

解决方案2
0 2021-02-22 21:56:57

比较相同 csv 文件中没有 header 的两列和使用 ZA7F5F354233B58682 的 output 匹配值。

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-22 21:39:40

解决方案2 0 2021-02-22 21:56:57

解决方案1
1 已采纳 2021-02-22 21:39:40

解决方案2
0 2021-02-22 21:56:57