简体   繁体   English

熊猫从CSV和XLS比较并删除数据

[英]Panda compare and remove data from csv and xls

I have 2 file (a .csv and a .xls). 我有2个文件(.csv和.xls)。 The .csv have only one column (e-mail). .csv只有一列(电子邮件)。 The .xls have many columns. .xls有很多列。 I try to compare email columns in these two files and remove from .xls mail address not in .csv. 我尝试比较这两个文件中的电子邮件列,并从不在.csv中的.xls邮件地址中删除。 The mails address are not sort. 邮件地址未排序。

I have write some code but I do not achieve my goal : 我已经写了一些代码,但没有实现我的目标:

excel = pd.read_excel(file, skiprow=10, parse_cols = 'AL')
csv = pd.read_csv(namelist_file)
excel_keep = excel[excel.isin(csv)]
mask = excel.isin(csv.tolist())
excel[~mask]
print(excel_keep)

Have you an idea please ? 请问您有什么主意吗? Regards. 问候。

df_csv = pd.read_csv(path_to_csv)
df_xlsx = pd.read_excel(path_to_excel)

## assuming column header for email in both files is 'email'
## if not change it by df = df.rename(columns={'oldName': 'email'})

df_xlsx = df_xlsx[df_xlsx['email'].isin(df_csv['email'])]

hope that helps 希望能有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM