[英]Python 3.5 combine two CSV by matching multiple columns
I have two data sets. 我有两个数据集。 The first is like this: 第一个是这样的:
data file:
Column 1, Column 2, Column 3, Column 4, Column 5, Column 6
1111111, 2222222, 3333333, 44444444, 55555555, 666666666
0000000, 77777777, 8888888, 99999999, 10101010, 121212121
3333333, 55555555, 9999999, 88888888, 22222222, 111111111
The second file is like this: 第二个文件是这样的:
descriptors file:
Column 1, Column 2, Column 3
11111111,, this is a descriptor
,777777777, this is a descriptor again
99999999, , last descriptor
What I want is as follows: 我想要的如下:
Column 1, Column 2, Column 3, Column 4, Column 5, Column 6, Column 7
1111111, 2222222, 3333333, 44444444, 55555555, 666666666, this is a descriptor
0000000, 77777777, 8888888, 99999999, 10101010, 121212121, this is a descriptor again
3333333, 55555555, 9999999, 88888888, 22222222, 111111111
I have the following code, from forums that I have manipulated for my use: 我有以下代码,这些代码来自我为自己使用而操纵的论坛:
import csv
with open('descriptors file.CSV', 'r') as first_file:
reader = csv.reader(first_file)
first_header = next(reader, None)
file_information = {row[0]: row for row in reader}
with open('data file.CSV', 'r') as second_file:
with open('final results.csv', 'w', newline='') as outfile:
reader = csv.reader(second_file)
second_header = next(reader, None)
writer = csv.writer(outfile)
writer.writerow(second_header[:6] + first_header[2:])
for row in reader:
if row[0] not in file_information:
continue
newrow = row[0:] + file_information[row[0]]
writer.writerow(newrow)
My problems are as follows: 1). 我的问题如下:1)。 I want to match between columns 0 and 1 (1 and 2);I am not matching between 2 columns; 我想在第0列和第1列(1和2)之间匹配;我在第2列之间不匹配; only one 2). 只有一个2)。 The results do not include blank lines. 结果不包括空行。 For example, if nothing is found in the descriptors file that matches in data file, I would rather keep the data in the data file instead of throwing it away. 例如,如果在描述符文件中找不到与数据文件匹配的任何内容,我宁愿将数据保留在数据文件中,而不是将其丢弃。 The data file should be augmented by the descriptors file, not reduced. 数据文件应由描述符文件增加,而不是减少。
3). 3)。 I cannot figure out how to only write the descriptors column, not the entire 3 columns in the descriptors file 我无法弄清楚如何只写描述符列,而不是描述符文件中的全部3列
at first - your files are a bit incorrect: 首先-您的文件有点不正确:
1111111 != 11111111
77777777 != 777777777
I've fixed this and this code works well for me. 我已修复此问题,此代码对我来说效果很好。 Sorry for hardcode. 对不起,硬编码。 if you need more complex solution - please tell what actually you want. 如果您需要更复杂的解决方案-请告诉您您实际想要的是什么。
import csv
with open('d_file.csv', 'r') as first_file:
reader = csv.reader(first_file)
first_header = next(reader, None)
column0= {}
column1 = {}
for row in reader:
if row[0]:
column0[row[0].strip()] = row[2]
if row[1]:
column1[row[1].strip()] = row[2]
with open('data_file.csv', 'r') as second_file:
with open('final_results.csv', 'w', newline='') as outfile:
reader = csv.reader(second_file)
second_header = next(reader, None)
description = len(second_header)-1
writer = csv.writer(outfile)
# use there first_header[2:] is incorrect - you will save 'Column 3', while you want 'Column 7'
writer.writerow(second_header[:6] + ['Column 7'])
for row in reader:
if row[0].strip() in column0:
writer.writerow(row[0:] + [column0[row[0].strip()]] )
elif row[1].strip() in column1:
writer.writerow(row[0:] + [column1[row[1].strip()]] )
else:
writer.writerow(row[0:])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.