按 Python 中的特定字符将一列分成两列

Question

I use Python3 and need to split price column which mixed price_value and price_unit together in a dataframe, the example data looks like 20dollar/m2/month or 1.8dollar/m2/day , I want split them to this format by word dollar :我使用 Python3 并且需要在 dataframe 中拆分将price_value和price_unit混合在一起的price列，示例数据看起来像20dollar/m2/month或dollar 1.8dollar/m2/day ，我想通过 word 将它们拆分为这种格式：

price_value      price_unit
20             dollar/m2/month
1.8            dollar/m2/day

I have tried with the following code:我尝试过使用以下代码：

Option 1:选项1：

df['price_value'] = df['price'].apply(lambda row: row.split('dollar')[0])
df['price_unit'] = df['price'].apply(lambda row: row.split('dollar')[-1])

Option 2:选项 2：

df['price_value'], df['price_unit'] = df1["price"].str.split('dollar', 1).str

But I get:但我得到：

price_value      price_unit
20                /m2/month
1.8               /m2/day

How can I split them correctly?如何正确拆分它们？ Thanks.谢谢。

Answer 1

You may use str.extract with a r'(?P<price_value>.*?)(?P<price_unit>dollar.*)' regex:您可以将str.extract与r'(?P<price_value>.*?)(?P<price_unit>dollar.*)'正则表达式一起使用：

>>> import pandas as pd
>>> df = pd.DataFrame(data=['20dollar/m2/month', '1.8dollar/m2/day'], columns=['price'])
>>> df['price'].str.extract(r'(?P<price_value>.*?)(?P<price_unit>dollar.*)')
  price_value       price_unit
0          20  dollar/m2/month
1         1.8    dollar/m2/day

See the regex demo .请参阅正则表达式演示。

Details细节

(?P<price_value>.*?) - Group "price_value": any 0+ chars other than line break chars as few as possible (?P<price_value>.*?) - 组“price_value”：除换行符之外的任何 0+ 字符尽可能少
(?P<price_unit>dollar.*) - Group "price_unit": dollar and any 0+ chars other than line break chars as many as possible. (?P<price_unit>dollar.*) - 组“price_unit”：尽可能多的dollar和除换行符之外的任何 0+ 字符。

I assume that you do not have any line breaks in the input, but if you happen to have any, prepend the pattern with the inline DOTALL modifier, (?s) : r'(?s)(?P<price_value>.*?)(?P<price_unit>dollar.*)'我假设您在输入中没有任何换行符，但如果您碰巧有任何换行符，请在模式前添加内联 DOTALL 修饰符(?s) : r'(?s)(?P<price_value>.*?)(?P<price_unit>dollar.*)'

To add the newly extracted columns to the existing data frame, you may also use要将新提取的列添加到现有数据框中，您还可以使用

df[['price_value', 'price_unit']] = df['price'].str.extract(r'(.*?)(dollar.*)')

Here, named capturing groups are not necessary since you define the column names beforehand.在这里，命名捕获组不是必需的，因为您事先定义了列名。

Answer 2

You could do:你可以这样做：

df = pd.DataFrame(data=['20dollar/m2/month', '1.8dollar/m2/day'], columns=['price_unit'])

# split by capture group
result = df['price_unit'].str.split('(dollar.*$)', expand=True).drop(2, axis=1)

# rename columns
result.columns = ['price_value', 'price_unit']

print(result)

Output Output

  price_value       price_unit
0          20  dollar/m2/month
1         1.8    dollar/m2/day

按 Python 中的特定字符将一列分成两列

问题描述

2 个解决方案

解决方案1
3 2019-10-29 10:04:57

解决方案2
2 已采纳 2019-10-29 10:00:38

按 Python 中的特定字符将一列分成两列

问题描述

2 个解决方案

解决方案1 3 2019-10-29 10:04:57

解决方案2 2 已采纳 2019-10-29 10:00:38

解决方案1
3 2019-10-29 10:04:57

解决方案2
2 已采纳 2019-10-29 10:00:38