[英]Compare two column from CSV file using python
I have a CSV file like: 我有一个CSV文件,例如:
item1,item2
A,B
B,C
C,D
E,F
I want to compare this two column and find the similar content from the two columns item1
and item2
. 我想比较这两列并从两列
item1
和item2
找到相似的内容。 The output should be like this: 输出应如下所示:
item
B
C
I have tried this code 我已经尝试过此代码
with open('output/id.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)
for line in csvreader:
if (line[0] == line[1]):
print line
else:
print("not match")
I am new to programming. 我是编程新手。 I don't know what the logic should be and how to solve this problem.
我不知道逻辑应该是什么以及如何解决这个问题。 please help.
请帮忙。
I would recommend you use the pandas
library, this will load your csv file into a nice dataframe data structure. 我建议您使用
pandas
库,这会将您的csv文件加载到一个不错的dataframe数据结构中。 Really convenient. 真的很方便。
import pandas as pd
df = pd.read_csv(filename)
Then you can get the similarities between both columns by doing 然后您可以通过执行以下操作获得两列之间的相似性
set(df['col1']) & set(df['col2'])
To get the output shaped the way you describe you can then make a new DataFrame with this intersected information as 为了按照您描述的方式调整输出,您可以使用以下交叉信息制作一个新的DataFrame:
df2 = pd.DataFrame(data = {'item': list(set(df['col1']) & set(df['col2']))})
For example, 例如,
import pandas as pd
d = {'col1': [1, 2, 6, 4, 3], 'col2': [3, 2, 5, 6, 8]}
df = pd.DataFrame(data=d)
set(df['col1']) & set(df['col2'])
{2, 3, 6}
{2,3,6}
You need to: 你需要:
'\\t'
as your delimiter, as your file is delimited by tabs, not commas '\\t'
作为分隔符,因为文件由制表符而不是逗号分隔 Here's my implementation: 这是我的实现:
import csv
with open('output/id.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter='\t')
items_in_1 = set()
items_in_2 = set()
for line in csvreader:
items_in_1.add(line[0])
items_in_2.add(line[1])
items_in_both = items_in_1.intersection(items_in_2)
print("item")
for item in items_in_both:
print(item)
You cannot succeed by reading row by rows. 您不能通过逐行阅读来成功。 You have to work on the columns.
您必须处理这些列。
Read both columns of your csv file (without the title) into 2 python set
s. 将csv文件的两列(不带标题)读入2个python
set
。
Perform sorted intersection and write back to another csv file: 执行排序的交集并写回另一个csv文件:
import csv
with open("test.csv") as f:
cr = csv.reader(f)
next(cr) # skip title
col1 = set()
col2 = set()
for a,b in cr:
col1.add(a)
col2.add(b)
with open("output.csv","w",newline="") as f:
cw = csv.writer(f)
cw.writerow(["item"])
cw.writerows(sorted(col1 & col2))
with test.csv
as: 与
test.csv
作为:
item1,item2
A,B
B,C
C,D
E,F
you get 你得到
item
B
C
note: if your csv file has more than 2 columns, the unpack doesn't work properly, adapt like this: 注意:如果您的csv文件有两列以上,则说明解压缩无法正常工作,请按以下方式进行调整:
for row in cr:
col1.add(row[0])
col2.add(row[1])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.