简体   繁体   English

数据框 - 合并 csv 和 excel 文件中的列

[英]Dataframe - Merge columns from csv and excel file

Hi there stack overflow community,嗨,堆栈溢出社区,

I have the following dataframe in an excel:我在excel中有以下数据框:

sparte  sparten      status stati       gesellschaft    gesellschaften
10      Krankenvoll     B   beantragt       0          - Allgemein -
11      Reisekranken    A   aktiv         10000        nordinvest
12      Krankenkasse    N   beitragsfrei  M552D        SV SparkassenVersicherung

and the folliwing column for merging in a csv:以及用于在 csv 中合并的以下列:

   sparten    status    gesellschaft
    10           B          0
    11           A        10000
    12           N        M552D

to merge some columns from an excel and a csv file I'm using the following code:要合并 excel 和 csv 文件中的一些列,我使用以下代码:

df1 = pd.read_csv(r'path', sep=',').drop(columns = ['risiko'])
df2 = pd.read_excel(r'path')

df3 = pd.merge(df1,df2[['status','stati']],on='status', how='left').drop(columns = ['status'])
df4 = df3.merge(df2[['sparte','sparten']],on='sparte', how='left').drop(columns = ['sparte'])

It works fine for me, but now i want to me merge the following column:它对我来说很好,但现在我想合并以下列:

    df4 = df3.merge(df2[['gesellschaft','gesellschaften']],on='gesellschaft', how='left')
    print(df4)

...and it does not work. ......它不起作用。 It merges only the cells with this format M552D , but leaves the cells with numbers untouched.它仅合并具有此格式M552D的单元格,但保留带有数字的单元格不变。 I don't understand what I'm doing wrong.我不明白我做错了什么。 If I try to put how='right' the merge works, but the other columns disappear.如果我尝试将how='right'合并工作,但其他列消失。

Maybe someone has an idea what is happening here!也许有人知道这里发生了什么! Thanks for any hint!感谢您的任何提示!

The problem is that the geselschaft column contains only strings in df1 which is loaded with read_csv , because the column is not fully numeric.问题是geselschaft列仅包含用read_csv加载的df1中的字符串,因为该列不是完全数字的。 But in df2 which is loaded with read_excel , it contains a mix of int and string values.但是在加载了read_exceldf2中,它包含 int 和 string 值的混合。 And at Pandas level and int and a string cannot be equal.在 Pandas 级别,int 和字符串不能相等。

A possible workaround is to force a string conversion at merge time:一种可能的解决方法是在合并时强制进行字符串转换:

df4 = df3.merge(df2[['gesellschaft','gesellschaften']], left_on='gesellschaft',
  right_on = df2['gesellschaft'].astype('str'), how='left')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM