[英]Python: Pandas Dataframe Using Wildcard to Find String in Column and Keep Row
I have a pandas data frame. 我有一个熊猫数据框。 Below is a sample table.
下面是一个示例表。
Event Text
A something/AWAIT hello
B la de la
C AWAITING SHIP
D yes NO AWAIT
I want to only keep rows that contain some form of the word AWAIT in the Text column. 我只想在“文本”列中保留包含某种形式的单词AWAIT的行。 Below is my desired table:
下面是我想要的表:
Event Text
A something/AWAIT hello
C AWAITING SHIP
D yes NO AWAIT
Below is the code I tried to capture strings that contain AWAIT in all possible circumstances. 以下是我试图在所有可能的情况下捕获包含AWAIT的字符串的代码。
df_STH001_2 = df_STH001[df_STH001['Text'].str.contains("?AWAIT?") == True]
The error I get is as follows: 我得到的错误如下:
error: nothing to repeat at position 0
Series.str.contains(pat, case=True, flags=0, na=nan, regex=True) per default treats pat
as a RegEx. 每个默认值的Series.str.contains(pat,case = True,flags = 0,na = nan,regex = True)将
pat
视为RegEx。
The question mark ( ?
) makes the preceding token in the regular expression optional, hence the error message. 问号(
?
)使正则表达式中的前一个标记为可选,因此出现错误消息。
In [178]: d[d['Text'].str.contains('AWAIT')]
Out[178]:
Event Text
0 A something/AWAIT hello
2 C AWAITING SHIP
3 D yes NO AWAIT
您也可以尝试match
方法:
df[df.column.str.match('some_string')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.