如何使用正则表达式将一列拆分为Pandas中的多列？

Question

For example, if I have a home address like this: 例如，如果我有这样的家庭住址：

71 Pilgrim Avenue, Chevy Chase, MD

in a column named 'address'. 在名为“地址”的列中。 I would like to split it into columns 'street', 'city', 'state', respectively. 我想将其分别分为“街道”，“城市”，“州”列。

What is the best way to achieve this using Pandas ? 使用Pandas实现此目标的最佳方法是什么？

I have tried df[['street', 'city', 'state']] = df['address'].findall(r"myregex") . 我已经尝试过df[['street', 'city', 'state']] = df['address'].findall(r"myregex") 。

But the error I got is Must have equal len keys and value when setting with an iterable . 但是我得到的错误是Must have equal len keys and value when setting with an iterable 。

Thank you for your help :) 谢谢您的帮助：）

Answer 1

You can use split by regex ,\\s+ ( , and one or more whitespaces): 您可以使用split通过正则表达式,\\s+ （ ,以及一个或多个空格）：

#borrowing sample from `Allen`
df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
print (df)
                              address id             street          city  \
0  71 Pilgrim Avenue, Chevy Chase, MD  a  71 Pilgrim Avenue   Chevy Chase   
1         72 Main St, Chevy Chase, MD  b         72 Main St   Chevy Chase   

  state  
0    MD  
1    MD

And if need remove column address add drop : 而如果需要删除列address添加drop ：

df[['street', 'city', 'state']] = df['address'].str.split(',\s+', expand=True)
df = df.drop('address', axis=1)
print (df)
  id             street         city state
0  a  71 Pilgrim Avenue  Chevy Chase    MD
1  b         72 Main St  Chevy Chase    MD

Answer 2

df = pd.DataFrame({'address': {0: '71 Pilgrim Avenue, Chevy Chase, MD',
      1: '72 Main St, Chevy Chase, MD'},
     'id': {0: 'a', 1: 'b'}})
#if your address format is consistent, you can simply use a split function.
df2 = df.join(pd.DataFrame(df.address.str.split(',').tolist(),columns=['street', 'city', 'state']))
df2 = df2.applymap(lambda x: x.strip())

如何使用正则表达式将一列拆分为Pandas中的多列？

问题描述

2 个解决方案

解决方案1
16 已采纳 2017-05-02 06:00:42

解决方案2
3 2017-05-02 05:15:15

如何使用正则表达式将一列拆分为Pandas中的多列？

问题描述

2 个解决方案

解决方案1 16 已采纳 2017-05-02 06:00:42

解决方案2 3 2017-05-02 05:15:15

解决方案1
16 已采纳 2017-05-02 06:00:42

解决方案2
3 2017-05-02 05:15:15