简体   繁体   中英

Pandas: How to split on multiple delimiters?

I've dataframe which contains latitude, longitude and altitude in single column ( coordinates ) and I want to split coordinates column into three columns(latitude, longitude and altitude).

df:

ID                                                         Coordinates                                                      Region  
1     latitude_degrees: 52.00755721100514\nlongitude_degrees: 12.565129548994266\naltitude_meters: 185.23616827199143\n     Europe   
2     latitude_degrees: 52.00755721100514\nlongitude_degrees: 12.565129548994266\naltitude_meters: 185.23616827199143\n     Europe   
3     latitude_degrees: 52.00755721100514\nlongitude_degrees: 12.565129548994266\naltitude_meters: 185.23616827199143\n     Europe   
4     latitude_degrees: 52.00755721100514\nlongitude_degrees: 12.565129548994266\naltitude_meters: 185.23616827199143\n     Europe   
5     latitude_degrees: 52.00755721100514\nlongitude_degrees: 12.565129548994266\naltitude_meters: 185.23616827199143\n     Europe  

Expected Output:

ID           lat                lon                     alt             Region  
1      52.00755721100514  12.565129548994266     185.23616827199143     Europe   
2      52.00755721100514  12.565129548994266     185.23616827199143     Europe   
3      52.00755721100514  12.565129548994266     185.23616827199143     Europe   
4      52.00755721100514  12.565129548994266     185.23616827199143     Europe   
5      52.00755721100514  12.565129548994266     185.23616827199143     Europe 

What I tried:

I tried to first split columns on : basis but it's not working:

df.loc[df['Coordinates'].isin(["latitude_degrees", "longitude_degrees"])]= ""

I also tried to replace the text but it's not working:

df.Coordinates.replace(to_replace=['latitude_degrees','longitude_degrees'],value='')

Let's use extractall to extract lat , long and alt from the Coordinates column, then unstack it to reshape, finally join this with the columns ID and Region :

c = df['Coordinates'].str.extractall(r'([\d.]+)')[0].unstack()
d = df[['ID', 'Region']].join(c.set_axis(['lat', 'long', 'alt'], 1))

   ID  Region                lat                long                 alt
0   1  Europe  52.00755721100514  12.565129548994266  185.23616827199143
1   2  Europe  52.00755721100514  12.565129548994266  185.23616827199143
2   3  Europe  52.00755721100514  12.565129548994266  185.23616827199143
3   4  Europe  52.00755721100514  12.565129548994266  185.23616827199143
4   5  Europe  52.00755721100514  12.565129548994266  185.23616827199143

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM