I have a Python dataframe like this with one column:
index Train_station
0 Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O
And I want to split it into 3 columns: Train station, Latitude, Longitude. The dataframe should look like this:
index Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
I've tried using df[['Latitude', 'Longitude']] = df.Train_station.str.split(',', expand=True) but it only split between latitude and longitude coordinates. How can I split a column with more than one condition that I define?
I've thought about method to check the string starting from the left and then split the when it meets an integer or the defined string but I've found no answer for this method so far.
df = df.Train_station.str.split(r'(.*?)(\d+°[^,]+),(.*)', expand=True)
print(df.loc[:, 1:3].rename(columns={1:'Train_station', 2:'Latitude', 3:'Longitude'}) )
Prints:
Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
EDIT: Thanks @ALollz, you can use str.extract()
:
df = df.Train_station.str.extract(r'(?P<Train_station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
print(df)
You can utilize the .split()
method for separating the values in the strings.
Use .apply()
to create new data-frame columns for each desired column name.
import pandas as pd
data = ["Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O",
"Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O",
"Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O"]
df = pd.DataFrame(data, columns=['Train_station'])
def train_station(x):
x = x.split(' ', 1)
return x[0]
def latitude(x):
x = x.split(' ', 1)
x = x[1].split(', ', 1)
return x[0]
def longitude(x):
x = x.split(' ', 1)
x = x[1].split(', ', 1)
return x[1]
df['Latitude'] = df['Train_station'].apply(latitude)
df['Longitude'] = df['Train_station'].apply(longitude)
df['Train_station'] = df['Train_station'].apply(train_station)
print(df)
What you see above is a recreation of your original data-frame and then modified with .split()
and .apply()
Output:
Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
You can try something like this:
df['Latitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″') for lett in i)]).split(',')[0])
df['Longitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″O') for lett in i)]).split(',')[1])
df['Train_station']=df['Train_station'].apply(lambda x: ''.join([i for i in x.split(' ') if not any((lett.replace(',','') in '°′″') for lett in i) ]))
Output:
Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N 13° 24′ 48″ O
Similar to what @ Andrej Kesely does.
import numpy as np
import pandas as pd
df2=df.Train_station.str.split('(?<=[a-z])(\s)(?![A-Z])|(?<=[A-Z]\,)(\s)|(?<=[A-Z])(\s)', expand=True).replace(' ', np.NaN).dropna(axis='columns')
df2.columns=['Train_station', 'Latitude', 'Longitude']
print(df2)
Train_station Latitude Longitude
0 Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1 Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2 Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O
Explanation,
(?<=[az])(\s)(?![AZ])
- Split by space after a lower alphabet but not followed by Upper case.
OR
(?<=[AZ]\,)(\s)
By space after Uppercase alphabet followed by comma
OR
(?<=[AZ])(\s)
By space after Uppercase alphabet
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.