简体   繁体   English

测试数据类型时,在for循环中使用正则表达式的最佳方法是什么?

[英]What is the best way to use regex in a for loop while testing for data type?

What is the best way to use regex in a for loop while testing for data type? 测试数据类型时,在for循环中使用正则表达式的最佳方法是什么?

For context, I'm looping over large unclean data sets with multiple data types and need to find extensions of strings, if they exist. 对于上下文,我正在遍历具有多种数据类型的大型不干净数据集,并且需要查找字符串扩展名(如果存在)。 Small changes to my code, like converting values to string costs me minutes. 对我的代码进行小的更改(例如将值转换为字符串)会花费我几分钟的时间。

I read through this question Python: How to use RegEx in an if statement? 我通读了这个问题:Python:如何在if语句中使用RegEx? but couldn't find a way of testing for a match without first converting to a string. 但是在没有先转换为字符串的情况下找不到测试匹配的方法。

Values: 价值观:

vals = [444444, '555555-Z01']
pattern = re.compile('[-]*[A-Z]{1}[0-9]{2}$')
# new_vals = [444444, 555555]

Slow method: (2.4 µs ± 93.6 ns per loop) 慢速方法:( 每个循环2.4 µs±93.6 ns)

new_vals = []
for v in vals:
    if type(v)==str:
        if pattern.search(v) is not None:
            new_v = pattern.findall(v)[0].replace('-','')
            new_vals.append(new_v)
    else:
        new_vals.append(v)

Fast method: (1.84 µs ± 34.7 ns per loop) 快速方法:( 每个循环1.84 µs±34.7 ns)

f = lambda x: x if type(x)!=str else pattern.findall(x)[0].replace('-','')

new_vals = []
for v in vals:
    new_vals.append(f(v))

Unsucessful Method: 不成功的方法:

new_vals = []
for v in vals:
    if ((type(v)==str) & (pattern.search(v) is not None)):
        new_vals.append(v)

Error: 错误:

TypeError: expected string or bytes-like object

I tried to beat your attempts using try/except blocks but the exception handling seems to take too much time. 我尝试使用try/except块来击败您的尝试,但是异常处理似乎花费了太多时间。 So much for "better ask forgiveness than permission" ... 对于“更好地请求宽恕而不是允许”而言,...

Your last attempt is the most promising, if you just change & by and , because & is the logical and and doesn't short circuit. 如果您仅更改& by and ,则最后一次尝试是最有前途的,因为&是合乎逻辑的并且不会短路。

I'll go for this, in a list comprehension to speed it up slightly more, and drop the is not None test which is useless since if search succeeds, it returns a regex object, which is truthy: 我将通过列表理解来加快速度,并放弃is not None测试,这是没有用的,因为如果search成功,它将返回一个regex对象,这是事实。

new_vals = [v for v in vals if type(v)==str and pattern.search(v)]

or with isinstance (same speed, tests subclasses of str too): 或带有isinstance (相同的速度,也测试str子类):

new_vals = [v for v in vals if isinstance(v,str) and pattern.search(v)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM