[英]How to find those rows which don't exist in another CSV file using python 3.7.5
I have a file ua.csv which has 2 rows and another file pr.csv which has 4 rows.我有一个 ua.csv 文件,它有 2 行,另一个文件 pr.csv 有 4 行。 I would like to know what are those rows which are present in pr.csv and ua.csv doesn't.我想知道 pr.csv 中存在哪些行,而 ua.csv 中没有。 Need to have count of extra rows present in pr.csv in the output.需要计算输出中 pr.csv 中存在的额外行数。
ua.csv
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134
Jane Lee|248 Another St.|Boston|US|02130
pr.csv
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134
Smoet|coffee shop|finland|Europe|3453335
Jane Lee|248 Another St.|Boston|US|02130
Jack|long street|malasiya|Asia|585858
Below is the expected output:以下是预期的输出:
pr.csv has 2 rows extra
Name|Address|City|Country|Pincode
Smoet|coffee shop|finland|Europe|3453335
Jack|long street|malasiya|Asia|585858
I guess you could use the set
datastructure:我想你可以使用set
数据结构:
ua_set = set()
pr_set = set()
# Code to populate the sets reading the csv files (use the `add` method of sets)
...
# Find the difference
diff = pr_set.difference(ua_set)
print(f"pr.csv has {len(diff)} rows extra")
# It would be better to not hardcode the name of the columns in the output
# but getting the info depends on the package you use to read csv files
print("Name|Address|City|Country|Pincode")
for row in diff:
print(row)
A better solution using the pandas
module:使用pandas
模块的更好解决方案:
import pandas as pd
df_ua = pd.read_csv("ua.scv") # Must modify path to ua.csv
df_pr = pd.read_csv("pr.csv") # Must modify path to pr.csv
df_diff = df_pr.merge(df_ua, how="outer", indicator=True).loc[lambda x: x["_merge"] == "left_only"].drop("_merge", axis=1)
print(f"pr.csv has {len(df_diff)} rows extra")
print(df_diff)
import csv
ua_dic={}
with open('ua.csv') as ua:
data=csv.reader(ua,delimiter=',')
for i in data:
if str(i) not in ua_dic:
ua_dic[str(i)]=1
output=[]
with open('pr.csv') as pr:
data=csv.reader(pr,delimiter=',')
for j in data:
if str(j) not in ua_dic:
output.append(j)
print(output)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.