在 Python 中將一列拆分為多列

Question

我有一個像這樣的 Python dataframe 與一列：

index  Train_station

0      Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1      Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2      Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O

我想把它分成 3 列：火車站、緯度、經度。 dataframe 應如下所示：

index  Train_station         Latitude       Longitude

0      Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1      Afrikanische Straße   52° 33′ 38″ N  13° 20′ 3″ O
2      Alexanderplatz        52° 31′ 17″ N  13° 24′ 48″ O

我試過使用df[['Latitude', 'Longitude']] = df.Train_station.str.split(',', expand=True)但它只在緯度和經度坐標之間分割。 如何拆分具有多個我定義的條件的列？

我已經考慮過從左側開始檢查字符串然后在遇到 integer 或定義的字符串時拆分字符串的方法，但到目前為止我還沒有找到這種方法的答案。

Answer 1

df = df.Train_station.str.split(r'(.*?)(\d+°[^,]+),(.*)', expand=True)
print(df.loc[:, 1:3].rename(columns={1:'Train_station', 2:'Latitude', 3:'Longitude'}) )

印刷：

          Train_station       Latitude       Longitude
0        Adenauerplatz   52° 29′ 59″ N   13° 18′ 26″ O
1  Afrikanische Straße   52° 33′ 38″ N    13° 20′ 3″ O
2       Alexanderplatz   52° 31′ 17″ N   13° 24′ 48″ O

編輯：感謝@ALollz，您可以使用str.extract() ：

df = df.Train_station.str.extract(r'(?P<Train_station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
print(df)

Answer 2

您可以使用.split()方法來分隔字符串中的值。

使用.apply()為每個所需的列名創建新的數據框列。

import pandas as pd

data = ["Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O",
        "Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O",
        "Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O"]

df = pd.DataFrame(data, columns=['Train_station'])


def train_station(x):
    x = x.split(' ', 1)
    return x[0]


def latitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[0]


def longitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[1]


df['Latitude'] = df['Train_station'].apply(latitude)
df['Longitude'] = df['Train_station'].apply(longitude)
df['Train_station'] = df['Train_station'].apply(train_station)

print(df)

您在上面看到的是對原始數據框的重新創建，然后使用.split()和.apply()進行了修改

Output：

    Train_station              Latitude      Longitude
0   Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1    Afrikanische  Straße 52° 33′ 38″ N   13° 20′ 3″ O
2  Alexanderplatz         52° 31′ 17″ N  13° 24′ 48″ O

Answer 3

你可以嘗試這樣的事情：

df['Latitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″') for lett in i)]).split(',')[0])
df['Longitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″O') for lett in i)]).split(',')[1])
df['Train_station']=df['Train_station'].apply(lambda x: ''.join([i for i in x.split(' ') if not any((lett.replace(',','') in '°′″') for lett in i) ]))

Output：

               Train_station       Latitude       Longitude
0          Adenauerplatz          52° 29′ 59″ N   13° 18′ 26″ O
1    Afrikanische Straße          52° 33′ 38″ N    13° 20′ 3″ O
2         Alexanderplatz          52° 31′ 17″ N   13° 24′ 48″ O

Answer 4

類似於@Andrej Kesely 所做的。

import numpy as np
import pandas as pd

df2=df.Train_station.str.split('(?<=[a-z])(\s)(?![A-Z])|(?<=[A-Z]\,)(\s)|(?<=[A-Z])(\s)', expand=True).replace(' ', np.NaN).dropna(axis='columns')
df2.columns=['Train_station', 'Latitude', 'Longitude']
print(df2)

     Train_station          Latitude      Longitude
0        Adenauerplatz    52° 29′ 59″ N,  13° 18′ 26″ O
1  Afrikanische Straße    52° 33′ 38″ N,   13° 20′ 3″ O
2       Alexanderplatz    52° 31′ 17″ N,  13° 24′ 48″ O

解釋，

(?<=[az])(\s)(?![AZ]) - 在小寫字母后用空格分隔，但后面不跟大寫。

或者

(?<=[AZ]\,)(\s)大寫字母后有空格，后跟逗號

OR

(?<=[AZ])(\s)大寫字母后的空格

在 Python 中將一列拆分為多列

問題描述

4 個解決方案

解決方案1
5 2020-06-21 00:02:53

解決方案2
5 2020-06-21 00:26:54

解決方案3
2 2020-06-21 00:00:22

解決方案4
1 2020-06-21 00:53:59

在 Python 中將一列拆分為多列

問題描述

4 個解決方案

解決方案1 5 2020-06-21 00:02:53

解決方案2 5 2020-06-21 00:26:54

解決方案3 2 2020-06-21 00:00:22

解決方案4 1 2020-06-21 00:53:59

解決方案1
5 2020-06-21 00:02:53

解決方案2
5 2020-06-21 00:26:54

解決方案3
2 2020-06-21 00:00:22

解決方案4
1 2020-06-21 00:53:59