在 Python 中将一列拆分为多列

Question

我有一个像这样的 Python dataframe 与一列：

index  Train_station

0      Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O
1      Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O
2      Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O

我想把它分成 3 列：火车站、纬度、经度。 dataframe 应如下所示：

index  Train_station         Latitude       Longitude

0      Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1      Afrikanische Straße   52° 33′ 38″ N  13° 20′ 3″ O
2      Alexanderplatz        52° 31′ 17″ N  13° 24′ 48″ O

我试过使用df[['Latitude', 'Longitude']] = df.Train_station.str.split(',', expand=True)但它只在纬度和经度坐标之间分割。 如何拆分具有多个我定义的条件的列？

我已经考虑过从左侧开始检查字符串然后在遇到 integer 或定义的字符串时拆分字符串的方法，但到目前为止我还没有找到这种方法的答案。

Answer 1

df = df.Train_station.str.split(r'(.*?)(\d+°[^,]+),(.*)', expand=True)
print(df.loc[:, 1:3].rename(columns={1:'Train_station', 2:'Latitude', 3:'Longitude'}) )

印刷：

          Train_station       Latitude       Longitude
0        Adenauerplatz   52° 29′ 59″ N   13° 18′ 26″ O
1  Afrikanische Straße   52° 33′ 38″ N    13° 20′ 3″ O
2       Alexanderplatz   52° 31′ 17″ N   13° 24′ 48″ O

编辑：感谢@ALollz，您可以使用str.extract() ：

df = df.Train_station.str.extract(r'(?P<Train_station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
print(df)

Answer 2

您可以使用.split()方法来分隔字符串中的值。

使用.apply()为每个所需的列名创建新的数据框列。

import pandas as pd

data = ["Adenauerplatz 52° 29′ 59″ N, 13° 18′ 26″ O",
        "Afrikanische Straße 52° 33′ 38″ N, 13° 20′ 3″ O",
        "Alexanderplatz 52° 31′ 17″ N, 13° 24′ 48″ O"]

df = pd.DataFrame(data, columns=['Train_station'])


def train_station(x):
    x = x.split(' ', 1)
    return x[0]


def latitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[0]


def longitude(x):
    x = x.split(' ', 1)
    x = x[1].split(', ', 1)
    return x[1]


df['Latitude'] = df['Train_station'].apply(latitude)
df['Longitude'] = df['Train_station'].apply(longitude)
df['Train_station'] = df['Train_station'].apply(train_station)

print(df)

您在上面看到的是对原始数据框的重新创建，然后使用.split()和.apply()进行了修改

Output：

    Train_station              Latitude      Longitude
0   Adenauerplatz         52° 29′ 59″ N  13° 18′ 26″ O
1    Afrikanische  Straße 52° 33′ 38″ N   13° 20′ 3″ O
2  Alexanderplatz         52° 31′ 17″ N  13° 24′ 48″ O

Answer 3

你可以尝试这样的事情：

df['Latitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″') for lett in i)]).split(',')[0])
df['Longitude']=df['Train_station'].apply(lambda x: ' '.join([i for i in x.split(' ') if any((lett.replace(',','') in '°′″O') for lett in i)]).split(',')[1])
df['Train_station']=df['Train_station'].apply(lambda x: ''.join([i for i in x.split(' ') if not any((lett.replace(',','') in '°′″') for lett in i) ]))

Output：

               Train_station       Latitude       Longitude
0          Adenauerplatz          52° 29′ 59″ N   13° 18′ 26″ O
1    Afrikanische Straße          52° 33′ 38″ N    13° 20′ 3″ O
2         Alexanderplatz          52° 31′ 17″ N   13° 24′ 48″ O

Answer 4

类似于@Andrej Kesely 所做的。

import numpy as np
import pandas as pd

df2=df.Train_station.str.split('(?<=[a-z])(\s)(?![A-Z])|(?<=[A-Z]\,)(\s)|(?<=[A-Z])(\s)', expand=True).replace(' ', np.NaN).dropna(axis='columns')
df2.columns=['Train_station', 'Latitude', 'Longitude']
print(df2)

     Train_station          Latitude      Longitude
0        Adenauerplatz    52° 29′ 59″ N,  13° 18′ 26″ O
1  Afrikanische Straße    52° 33′ 38″ N,   13° 20′ 3″ O
2       Alexanderplatz    52° 31′ 17″ N,  13° 24′ 48″ O

解释，

(?<=[az])(\s)(?![AZ]) - 在小写字母后用空格分隔，但后面不跟大写。

或者

(?<=[AZ]\,)(\s)大写字母后有空格，后跟逗号

OR

(?<=[AZ])(\s)大写字母后的空格

在 Python 中将一列拆分为多列

问题描述

4 个解决方案

解决方案1
5 2020-06-21 00:02:53

解决方案2
5 2020-06-21 00:26:54

解决方案3
2 2020-06-21 00:00:22

解决方案4
1 2020-06-21 00:53:59

在 Python 中将一列拆分为多列

问题描述

4 个解决方案

解决方案1 5 2020-06-21 00:02:53

解决方案2 5 2020-06-21 00:26:54

解决方案3 2 2020-06-21 00:00:22

解决方案4 1 2020-06-21 00:53:59

解决方案1
5 2020-06-21 00:02:53

解决方案2
5 2020-06-21 00:26:54

解决方案3
2 2020-06-21 00:00:22

解决方案4
1 2020-06-21 00:53:59