[英]Compare two columns with no header of same csv file and output matching values using Python 3.8
I have a CSV file with two columns with no header.我有一个 CSV 文件,其中两列没有 header。 I want to compare these two columns and find out if column one matches with the list of column 2 and extract the matching values into a new CSV file (output.csv), and delete the whole row if column 2 does not have matching values with column 1. For example,我想比较这两列并找出第一列是否与第 2 列的列表匹配,并将匹配值提取到新的 CSV 文件(output.csv)中,如果第 2 列没有匹配值,则删除整行第 1 列。例如,
Input.csv:输入.csv:
1,"[0, 10, 12, 13, 16, 25, 32, 35, 60, 86, 98, 108, 168, 172, 222, 251, 275, 278, 325, 365]"
60,"[12014, 25665, 28278]"
86,"[0, 6, 7, 10, 12, 25, 76, 156, 174, 176, 181, 188, 365, 392, 438]"
108,"[1, 16, 21, 32, 35, 61, 81, 83, 95, 138, 153, 204, 222]"
438,"[30549]"
28278,"[60, 120, 140, 505, 3939, 4034, 7213, 7308, 8784, 14126, 14147, 15197, 16842, 20022, 28229]"
output.csv: output.csv:
1,"[60, 108]"
60,"[28278]"
108,"[1]"
28278,"[60]"
I have tried this code,我试过这段代码,
import csv
with open('input.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter='\t')
nodes_in_1 = set()
nodes_in_2 = set()
for line in csvreader:
nodes_in_1.add(line[0])
nodes_in_2.add(line[1])
nodes_in_both = nodes_in_1.intersection(nodes_in_2)
with open('output.csv', 'w') as f_out:
f_out.write(nodes_in_both + '\n')
I am a beginner.我是初学者。 Thank you for the help.感谢您的帮助。
This can indeed be done in pandas:这确实可以在 pandas 中完成:
import pandas as pd
from ast import literal_eval
df = pd.read_csv("test.csv",header=None, converters={1: literal_eval}) # load csv, use literal_eval to load the lists as lists in stead of strings
df[1] = df[1].apply(lambda x: [i for i in x if i in df[0].tolist()]) # keep only the values in the lists in the second column that match with a value in the first column
df = df[df[1].map(len) > 0] # drop rows with empty lists
df.to_csv('output.csv', index=False, header=None) # write df to csv
Output df: Output df:
| | 0 | 1 |
|---:|------:|:--------------|
| 0 | 1 | [60, 86, 108] |
| 1 | 60 | [28278] |
| 2 | 86 | [438] |
| 3 | 108 | [1] |
| 5 | 28278 | [60] |
import re
def run():
c1,c3 = [],[]
with open('stack1.txt') as f:
# c1 holds 1st column values
for line in f:
c1.append(line.split(',')[0].replace(' ',''))
f.seek(0)
for line in f:
cx = line.split(',')[0].replace(' ','')
# get list stored in column 2
c2 = re.search('\[.*\]', line).group()[1:-1].replace(' ','').split(',')
# find elements common with c1 (first column)
c2 = [i for i in c2 if i in c1]
if c2:
c3.append(cx + ',"[{}]"'.format(','.join(c2)))
with open('stack1.out','w') as f:
f.write('\n'.join(c3))
if __name__ == '__main__':
run()
Above code does what you want to do.上面的代码做了你想做的事。 I hope I am not abusing your class homework;-)我希望我没有滥用你的 class 作业;-)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.