[英]Get index of regex match in pandas dataframe not working
I have an excel worksheet that I am reading into pandas for parsing and later analysis.我有一个 excel 工作表,我正在将它读入 Pandas 以进行解析和以后的分析。 It has the following format.
它具有以下格式。 All values are strings.
所有值都是字符串。 They will be converted to floats/ints later but having them as strings helps with parsing.
它们稍后将转换为浮点数/整数,但将它们作为字符串有助于解析。
column1 | column2 | column3 |
-----------------------------
12345 |10 |20 |
txt |25 |65 |
35615 |15 |20 |
txt |35 |20 |
I need to get the index of all 5 digit, numerical values in column1.我需要获取 column1 中所有 5 位数字的索引。 It will always be a 5 digit.
它将始终是 5 位数字。 I am using the following regex.
我正在使用以下正则表达式。
\b\d{5}\b
I am having problems getting pandas to properly match the 5 digits when using any of the built in string methods.使用任何内置字符串方法时,我在让熊猫正确匹配 5 位数字时遇到问题。
I have tried the following.我尝试了以下方法。
df.column1.str.contains('\b\d{5}\b', regex=True).index.list()
df.column1.str.match('\b\d{5}\b').index.list()
I am expecting it to return我期待它回来
[0,2]
Both of these return an empty list.这两个都返回一个空列表。 What am I doing wrong?
我究竟做错了什么?
Add r
before string, filter by boolean indexing
and get index values to list:在字符串之前添加
r
,通过boolean indexing
过滤并获取要列出的索引值:
i = df[df.column1.str.contains(r'\b\d{5}\b')].index.tolist()
print (i)
[0, 2]
Or if want parse only numeric values with length 5
change regex with ^
and $
for start and end of string:或者,如果只想解析长度为
5
数值,请使用^
和$
更改正则表达式以表示字符串的开头和结尾:
i = df[df.column1.str.contains(r'^\d{5}$')].index.tolist()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.