简体   繁体   English

使用python比较CSV文件中的两列

[英]Compare two column from CSV file using python

I have a CSV file like: 我有一个CSV文件,例如:

item1,item2 
A,B
B,C
C,D
E,F

I want to compare this two column and find the similar content from the two columns item1 and item2 . 我想比较这两列并从两列item1item2找到相似的内容。 The output should be like this: 输出应如下所示:

 item 
  B
  C

I have tried this code 我已经尝试过此代码

with open('output/id.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)

for line in csvreader:
    if (line[0] == line[1]):
        print line
    else:
        print("not match")

I am new to programming. 我是编程新手。 I don't know what the logic should be and how to solve this problem. 我不知道逻辑应该是什么以及如何解决这个问题。 please help. 请帮忙。

I would recommend you use the pandas library, this will load your csv file into a nice dataframe data structure. 我建议您使用pandas库,这会将您的csv文件加载到一个不错的dataframe数据结构中。 Really convenient. 真的很方便。

import pandas as pd

df = pd.read_csv(filename)

Then you can get the similarities between both columns by doing 然后您可以通过执行以下操作获得两列之间的相似性

set(df['col1']) & set(df['col2'])

To get the output shaped the way you describe you can then make a new DataFrame with this intersected information as 为了按照您描述的方式调整输出,您可以使用以下交叉信息制作一个新的DataFrame:

df2 = pd.DataFrame(data = {'item': list(set(df['col1']) & set(df['col2']))})

For example, 例如,

import pandas as pd
d = {'col1': [1, 2, 6, 4, 3], 'col2': [3, 2, 5, 6, 8]}
df = pd.DataFrame(data=d)
set(df['col1']) & set(df['col2'])

{2, 3, 6} {2,3,6}

You need to: 你需要:

  1. Use '\\t' as your delimiter, as your file is delimited by tabs, not commas 使用'\\t'作为分隔符,因为文件由制表符而不是逗号分隔
  2. Get all the items from both lists as a set, then get the intersection of the two sets 从两个列表中获取所有项作为一个集合,然后获取两个集合的交集
  3. Print them 打印它们

Here's my implementation: 这是我的实现:

import csv
with open('output/id.csv', 'r') as csvfile:
    csvreader = csv.reader(csvfile, delimiter='\t')

    items_in_1 = set()
    items_in_2 = set()

    for line in csvreader:
        items_in_1.add(line[0])
        items_in_2.add(line[1])

    items_in_both = items_in_1.intersection(items_in_2)

    print("item")
    for item in items_in_both:
        print(item)

You cannot succeed by reading row by rows. 您不能通过逐行阅读来成功。 You have to work on the columns. 您必须处理这些列。

Read both columns of your csv file (without the title) into 2 python set s. 将csv文件的两列(不带标题)读入2个python set

Perform sorted intersection and write back to another csv file: 执行排序的交集并写回另一个csv文件:

import csv

with open("test.csv") as f:
    cr = csv.reader(f)
    next(cr) # skip title
    col1 = set()
    col2 = set()
    for a,b in cr:
        col1.add(a)
        col2.add(b)

with open("output.csv","w",newline="") as f:
    cw = csv.writer(f)
    cw.writerow(["item"])
    cw.writerows(sorted(col1 & col2))

with test.csv as: test.csv作为:

item1,item2
A,B
B,C
C,D
E,F

you get 你得到

item
B
C

note: if your csv file has more than 2 columns, the unpack doesn't work properly, adapt like this: 注意:如果您的csv文件有两列以上,则说明解压缩无法正常工作,请按以下方式进行调整:

for row in cr:
    col1.add(row[0])
    col2.add(row[1])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Python 3.8 将一个 CSV 文件中的一列(向量)与另一个 CSV 文件中的两列(向量和数组)进行比较 - Compare one column (vector) from one CSV file with two columns (vector and array) from another CSV file using Python 3.8 如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存 - How to compare two CSV files by column and save the differences in csv file using pandas python 如何比较 csv 文件中一列的两行并在 Python 中相应地创建一个新列 - how do i compare two rows from one column in a csv file and create a new column accordingly in Python Python Pandas:比较两个 CSV 文件并通过匹配列从两个文件中删除行 - Python Pandas: Compare two CSV files and delete lines from both the file by matching a column 检查csv文件列值,并使用Python将其与阈值进行比较 - Check csv file column value and compare it to a threshold using Python 如何使用python比较两个不同的csv文件? - How can I compare two different csv file using python? 使用 python 中的一个 csv 文件比较一列中的两个数据 - compare two data in a columns using one csv file in python 将多个文本文件中的列与 csv 列文件 python 进行比较 - compare columns from multiple text files, to csv column file python 在Python中,如何根据一列中的值比较两个csv文件并从第一个文件中输出与第二个不匹配的记录 - In Python, how to compare two csv files based on values in one column and output records from first file that do not match second 在Python中的csv文件中比较一列到另一列 - compare one column to another in csv file in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM