简体   繁体   中英

Split one column into multiple columns in Python

I have a Python dataframe like this with one column:

index  Train_station

0      Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1      Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2      Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O

And I want to split it into 3 columns: Train station, Latitude, Longitude. The dataframe should look like this:

index  Train_station         Latitude       Longitude

0      Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1      Afrikanische Straße   52° 33′ 38″ N  13° 20′ 3″ O
2      Alexanderplatz        52° 31′ 17″ N  13° 24′ 48″ O

I've tried using df[['Latitude', 'Longitude']] = df.Train_station.str.split(',', expand=True) but it only split between latitude and longitude coordinates. How can I split a column with more than one condition that I define?

I've thought about method to check the string starting from the left and then split the when it meets an integer or the defined string but I've found no answer for this method so far.

df = df.Train_station.str.split(r'(.*?)(\d+°[^,]+),(.*)', expand=True)
print(df.loc[:, 1:3].rename(columns={1:'Train_station', 2:'Latitude', 3:'Longitude'}) )

Prints:

          Train_station       Latitude       Longitude
0        Adenauerplatz   52° 29′ 59″ N   13° 18′ 26″ O
1  Afrikanische Straße   52° 33′ 38″ N    13° 20′ 3″ O
2       Alexanderplatz   52° 31′ 17″ N   13° 24′ 48″ O

EDIT: Thanks @ALollz, you can use str.extract() :

df = df.Train_station.str.extract(r'(?P<Train_station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
print(df)

You can utilize the .split() method for separating the values in the strings.

Use .apply() to create new data-frame columns for each desired column name.

import pandas as pd

data = ["Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O",
        "Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O",
        "Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O"]

df = pd.DataFrame(data, columns=['Train_station'])


def train_station(x):
    x = x.split(' ', 1)
    return x[0]


def latitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[0]


def longitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[1]


df['Latitude'] = df['Train_station'].apply(latitude)
df['Longitude'] = df['Train_station'].apply(longitude)
df['Train_station'] = df['Train_station'].apply(train_station)

print(df)

What you see above is a recreation of your original data-frame and then modified with .split() and .apply()

Output:

    Train_station              Latitude      Longitude
0   Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1    Afrikanische  Straße 52° 33′ 38″ N   13° 20′ 3″ O
2  Alexanderplatz         52° 31′ 17″ N  13° 24′ 48″ O

You can try something like this:

df['Latitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″') for lett in i)]).split(',')[0])
df['Longitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″O') for lett in i)]).split(',')[1])
df['Train_station']=df['Train_station'].apply(lambda x: ''.join([i for i in x.split(' ') if not any((lett.replace(',','') in '°′″') for lett in i) ]))

Output:

               Train_station       Latitude       Longitude
0          Adenauerplatz          52° 29′ 59″ N   13° 18′ 26″ O
1    Afrikanische Straße          52° 33′ 38″ N    13° 20′ 3″ O
2         Alexanderplatz          52° 31′ 17″ N   13° 24′ 48″ O

Similar to what @ Andrej Kesely does.

import numpy as np
import pandas as pd

df2=df.Train_station.str.split('(?<=[a-z])(\s)(?![A-Z])|(?<=[A-Z]\,)(\s)|(?<=[A-Z])(\s)', expand=True).replace(' ', np.NaN).dropna(axis='columns')
df2.columns=['Train_station', 'Latitude', 'Longitude']
print(df2)

     Train_station          Latitude      Longitude
0        Adenauerplatz    52° 29′ 59″ N,  13° 18′ 26″ O
1  Afrikanische Straße    52° 33′ 38″ N,   13° 20′ 3″ O
2       Alexanderplatz    52° 31′ 17″ N,  13° 24′ 48″ O

Explanation,

(?<=[az])(\s)(?![AZ]) - Split by space after a lower alphabet but not followed by Upper case.

OR

(?<=[AZ]\,)(\s) By space after Uppercase alphabet followed by comma

OR

(?<=[AZ])(\s) By space after Uppercase alphabet

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM