[英]How to extract substring from varible length column in pandas dataframe?
Hi there I am trying to accomplish something similar to the mid function in excel with a column in a pandas dataframe in python. 嗨,我正在尝试用python中的pandas数据框中的一列来完成类似于excel中的mid函数的操作。 I have a column with medication names + strengths, etc of variable length. 我有一列药物名称+强度等长度可变的列。 I just want to pull out the first "part" of the name and place the result into another column in the dataframe. 我只想提取名称的第一个“部分”并将结果放入数据框中的另一列。
Example: 例:
Dataframe column 数据框列
MEDICATION_NAME acetaminophen 325 mg a-hydrocort 100 mg/2 ml
Desired Result 所需结果
MEDICATION_NAME GENERIC_NAME acetaminophen 325 mg acetaminophen a-hydrocort 100 mg/2 ml a-hydrocort
What I have tried 我尝试过的
df['GENERIC_NAME'] = df['MEDICATION_NAME'].str[:df['MEDICATION_NAME'].apply(lambda x: x.find(' '))]
Basically I want to apply the row specific result of 基本上我想应用特定于行的结果
df['GENERIC_NAME'] = df['MEDICATION_NAME'].apply(lambda x: x.find(' '))
to the 到
str[:]function? 功能?
Thanks 谢谢
You can use str.partition
[ pandas-doc
] here: 您可以在此处使用str.partition
[ pandas-doc
] :
df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.partition(' ')[0]
For the given column this gives: 对于给定的列,它给出:
>>> g.str.partition(' ')[0]
0 acetaminophen
1 a-hydrocort
Name: 0, dtype: object
partition
itself creates from a series a dataframe with three columns: before, match, and after : partition
本身从一系列数据创建一个具有三列的数据框:before,match和after:
>>> df['MEDICATION_NAME'].str.partition(' ')
0 1 2
0 acetaminophen 325 mg
1 a-hydrocort 100 mg/2 ml
DO with str.split
用str.split
df['MEDICATION_NAME'].str.split(n=1).str[0]
Out[345]:
0 acetaminophen
1 a-hydrocort
Name: MEDICATION_NAME, dtype: object
#df['GENERIC_NAME']=df['MEDICATION_NAME'].str.split(n=1).str[0]
Use str.extract
to use full regex features: 使用str.extract
使用完整的正则表达式功能:
df["GENERIC_NAME"] = df["MEDICATION_NAME"].str.extract(r'([^\s]+)')
This capture the first word bounded by space. 这捕获了以空间为界的第一个单词。 So will protect against instances where there are a space first. 因此将防止出现先有空格的情况。
尝试这个:
df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.split(" ")[0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.