[英]Python: How to extract multiple strings from pandas dataframe column
I have a dataset which has a specific column containing strings in the format of: Building = Building_A and Floor = Floor_4 Building = Building_D and Floor = Floor_2我有一个包含以下格式字符串的特定列的数据集:Building = Building_A and Floor = Floor_4 Building = Building_D and Floor = Floor_2
I would like to extract only the building and floor names, concatenated into a single string / new column.我只想提取建筑物和楼层名称,并连接成一个字符串/新列。 Eg Building_A/Floor_4 Building_D/Floor_2
例如 Building_A/Floor_4 Building_D/Floor_2
I've spent about an hour looking through previous posts and was not able to find something to match what I am trying to do.我花了大约一个小时浏览以前的帖子,但找不到与我正在尝试做的事情相匹配的内容。 Any help would be appreciated.
任何帮助,将不胜感激。
Assume we have dataframe df
:假设我们有数据帧
df
:
import pandas as pd
df = pd.DataFrame({'txt': ["Building = Building_A and Floor = Floor_4",\
"Building = Building_Z and Floor = Floor_9",\
"Building = Martello and Floor = Ground"]})
First define pattern to extract:首先定义要提取的模式:
pat = "(Floor_\d+)|(Building_\w{1})"
Alternatively if You look for all words after "= "
:或者,如果您查找
"= "
之后的所有单词:
pat = r"(?<== )(\w+)"
Please note lookbehind (?<=)
in pattern definition.请注意模式定义中的后视
(?<=)
。
Then apply lambda function to column txt
:然后将 lambda 函数应用于列
txt
:
df['txt_extract'] = \
df[['txt']].apply(lambda r: "/".join(r.str.extractall(pat).stack()), axis=1)
Result:结果:
0 Building_A/Floor_4
1 Building_Z/Floor_9
2 Martello/Ground
Instead of str.extract
use str.extractall
which looks for all occurences of pattern.代替
str.extract
使用str.extractall
查找所有出现的模式。 Resulting searches are stacked and joined with "/"
separator.结果搜索堆叠并使用
"/"
分隔符连接。 Please note that order of patterns found is preserved what may be important in Your case.请注意,找到的模式的顺序被保留,这对您的情况可能很重要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.