简体   繁体   English

如何使用 python 3.7.5 查找另一个 CSV 文件中不存在的那些行

[英]How to find those rows which don't exist in another CSV file using python 3.7.5

I have a file ua.csv which has 2 rows and another file pr.csv which has 4 rows.我有一个 ua.csv 文件,它有 2 行,另一个文件 pr.csv 有 4 行。 I would like to know what are those rows which are present in pr.csv and ua.csv doesn't.我想知道 pr.csv 中存在哪些行,而 ua.csv 中没有。 Need to have count of extra rows present in pr.csv in the output.需要计算输出中 pr.csv 中存在的额外行数。

ua.csv
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134 
Jane Lee|248 Another St.|Boston|US|02130 
pr.csv
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134 
Smoet|coffee shop|finland|Europe|3453335
Jane Lee|248 Another St.|Boston|US|02130 
Jack|long street|malasiya|Asia|585858

Below is the expected output:以下是预期的输出:

pr.csv has 2 rows extra

Name|Address|City|Country|Pincode
Smoet|coffee shop|finland|Europe|3453335
Jack|long street|malasiya|Asia|585858

I guess you could use the set datastructure:我想你可以使用set数据结构:

ua_set = set()
pr_set = set()

# Code to populate the sets reading the csv files (use the `add` method of sets)
...

# Find the difference
diff = pr_set.difference(ua_set)

print(f"pr.csv has {len(diff)} rows extra")

# It would be better to not hardcode the name of the columns in the output 
# but getting the info depends on the package you use to read csv files
print("Name|Address|City|Country|Pincode")  

for row in diff:
    print(row)

A better solution using the pandas module:使用pandas模块的更好解决方案:

import pandas as pd

df_ua = pd.read_csv("ua.scv") # Must modify path to ua.csv
df_pr = pd.read_csv("pr.csv") # Must modify path to pr.csv

df_diff = df_pr.merge(df_ua, how="outer", indicator=True).loc[lambda x: x["_merge"] == "left_only"].drop("_merge", axis=1)

print(f"pr.csv has {len(df_diff)} rows extra")

print(df_diff)
import csv
ua_dic={}
with open('ua.csv') as ua:
  data=csv.reader(ua,delimiter=',')

  for i in data:
    if str(i) not in ua_dic:
        ua_dic[str(i)]=1

output=[]
with open('pr.csv') as pr:
  data=csv.reader(pr,delimiter=',')

  for j in data:
    if str(j) not in ua_dic:
        output.append(j)

  print(output)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:通过多列查找另一个DataFrame中不存在的行 - Pandas: Find rows which don't exist in another DataFrame by multiple columns 查找 dataframe 的哪些行存在于另一个 dataframe 中 - Find which rows of a dataframe exist in another dataframe 如何从 a.CSV 文件中找到前 10 行 AWND,并使用 Python 将结果存储在新的.CSV 文件中? - How to find the top 10 rows of AWND from a .CSV file and store the result in a new .CSV file using Python? 如果SPSS变量在原始文件中不存在,则使用Python合并 - Merge SPSS variables if they don't exist in the original file, using Python 如何使用 Python 将行从一个 CSV 复制到另一个 CSV 文件 - How to copy rows from one CSV to another CSV file using Python 如何使用python将包含特定单词的整行excel(.csv)复制到另一个csv文件中? - How to copy entire row of excel (.csv) which contain specific words into another csv file using python? 我必须使用空白单元格过滤特定列并使用 Python 删除 csv 文件中的那些行 - I have to filter specific columns with blank cells and remove those rows in a csv file using Python 如果行与 pandas 中的头部不匹配,如何删除行? - How to remove rows if those don't match the head in pandas? 如何使用python从CSV文件中过滤两个日期之间的行并重定向到另一个文件? - How to filter rows between two dates from CSV file using python and redirect to another file? 如何从.csv 文件中解析出 dataframe。 其中包含使用 Python 的 header 详细信息行 - How to parse out a dataframe from .csv file. which contains header detail rows using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM