如何从熊猫数据框中的可变长度列中提取子字符串？

Question

Hi there I am trying to accomplish something similar to the mid function in excel with a column in a pandas dataframe in python. 嗨，我正在尝试用python中的pandas数据框中的一列来完成类似于excel中的mid函数的操作。 I have a column with medication names + strengths, etc of variable length. 我有一列药物名称+强度等长度可变的列。 I just want to pull out the first "part" of the name and place the result into another column in the dataframe. 我只想提取名称的第一个“部分”并将结果放入数据框中的另一列。

Example: 例：

Dataframe column 数据框列

MEDICATION_NAME
acetaminophen 325 mg
a-hydrocort 100 mg/2 ml

Desired Result 所需结果

MEDICATION_NAME               GENERIC_NAME
acetaminophen 325 mg          acetaminophen     
a-hydrocort 100 mg/2 ml       a-hydrocort

What I have tried 我尝试过的

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str[:df['MEDICATION_NAME'].apply(lambda x: x.find(' '))]

Basically I want to apply the row specific result of 基本上我想应用特定于行的结果

df['GENERIC_NAME'] = df['MEDICATION_NAME'].apply(lambda x: x.find(' '))

to the 到

str[:]

function? 功能？

Thanks 谢谢

Answer 1

You can use str.partition [ pandas-doc ] here: 您可以在此处使用str.partition [ pandas-doc ] ：

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.partition(' ')[0]

For the given column this gives: 对于给定的列，它给出：

>>> g.str.partition(' ')[0]
0    acetaminophen
1      a-hydrocort
Name: 0, dtype: object

partition itself creates from a series a dataframe with three columns: before, match, and after : partition本身从一系列数据创建一个具有三列的数据框：before，match和after：

>>> df['MEDICATION_NAME'].str.partition(' ')
               0  1            2
0  acetaminophen          325 mg
1    a-hydrocort     100 mg/2 ml

Answer 2

DO with str.split 用str.split

df['MEDICATION_NAME'].str.split(n=1).str[0]
Out[345]: 
0    acetaminophen
1      a-hydrocort
Name: MEDICATION_NAME, dtype: object
#df['GENERIC_NAME']=df['MEDICATION_NAME'].str.split(n=1).str[0]

Answer 3

Use str.extract to use full regex features: 使用str.extract使用完整的正则表达式功能：

df["GENERIC_NAME"] = df["MEDICATION_NAME"].str.extract(r'([^\s]+)')

This capture the first word bounded by space. 这捕获了以空间为界的第一个单词。 So will protect against instances where there are a space first. 因此将防止出现先有空格的情况。

Answer 4

尝试这个：

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.split(" ")[0]

如何从熊猫数据框中的可变长度列中提取子字符串？

问题描述

4 个解决方案

解决方案1
3 2018-11-09 20:55:04

解决方案2
2 已采纳 2018-11-09 20:54:26

解决方案3
1 2018-11-09 20:54:29

解决方案4
1 2018-11-09 20:54:30

如何从熊猫数据框中的可变长度列中提取子字符串？

问题描述

4 个解决方案

解决方案1 3 2018-11-09 20:55:04

解决方案2 2 已采纳 2018-11-09 20:54:26

解决方案3 1 2018-11-09 20:54:29

解决方案4 1 2018-11-09 20:54:30

解决方案1
3 2018-11-09 20:55:04

解决方案2
2 已采纳 2018-11-09 20:54:26

解决方案3
1 2018-11-09 20:54:29

解决方案4
1 2018-11-09 20:54:30