简体   繁体   中英

Split pandas dataframe column to new 4 columns

I have this Pandas df and I would to spilt the Adress column (Last one) to 4 new columns Stree name + num, zipcode, City and land.

test

 ID           Address
1.10065e+08  Bachgasse 39 \n69502 Hemsbach \nDeutschland
2.34115e+08  Am Friedensplatz 3\n68165 Mannheim\nDeutschland
2.36743e+08  Am Friedensplatz 3\n68165 Mannheim\nDeutschland
2.24763e+08  Am Friedensplatz 3\n68165 Mannheim\nDeutschland
2.26209e+08  Am Friedensplatz 3\n68165 Mannheim
2.2621e+08   Am Friedensplatz 3\n68165 Mannheim
2.35501e+08  Herman-BurcharStrasse 1\n7265 Davos Wolfgang\n...
2.31895e+08  Via Nova 37\n7017 Flims Dorf\nSchweiz
2.3611e+08   Neu-Isenburg\nDeutschland
2.40194e+08  Herman-BurcharStrasse 1\n7265 Davos Wolfgang\n. 

I would like to get this output

   ID           Street zipcode   city         country
1.10065e+08  Bachgasse39        69502 Hemsbach Deutschland
2.34115e+08  Am Friedensplatz3 68165 Mannheim  Deutschland
2.36743e+08  Am Friedensplatz3 68165 Mannheim  Deutschland
2.24763e+08  Am Friedensplatz3 68165 Mannheim  Deutschland
2.26209e+08  Am Friedensplatz3 68165 Mannheim  Nan
2.2621e+08   Am Friedensplatz3 68165 Mannheim  Nan
....          .......          .....  ....      ....
....          ......           ...... ....     ......

I've tried this approache to solve that but doesn't work for me:

(A,B,C,D) are column names for (Street name + num, Zipcode...)

pd.DataFrame(test['Firmen Adresse Geschäftlich'].str.split(r"\n",1).tolist(),columns = ['A','B','C'])

but i got this error:

TypeError: object of type 'float' has no len()

Here also imges:

在此处输入图像描述

I would like to get like this: 在此处输入图像描述

在此处输入图像描述

I have these addresses patterns in my dataframe: 在此处输入图像描述

在此处输入图像描述

在此处输入图像描述

Given that your column Firmen Adresse Geschäftlich is string, you can try the following:

df1=pd.DataFrame(test['Firmen Adresse 
        Geschäftlich'].str.split(r"\n").tolist(),columns = ['street 
        no.','zip','Land'],index=test['ID'])

df1[['zip','Stadt']]=pd.DataFrame(df1['zip'].str.strip().str.split(' 
   ').tolist(),index = df1.index)

The output with a smaller dateset looks like:

           street no.    zip         Land     Stadt
ID                                                  
1        Bachgasse 39   69502  Deutschland  Hemsbach
2   Am Friedensplatz 3  68165  Deutschland  Mannheim

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM