简体   繁体   English

正则表达式还是split('')?

[英]regex or split(' ')?

I am completely at a lost with how to appropriately construct a regular expression to do what I want with this file. 我完全不知道如何适当地构造一个正则表达式来执行此文件的操作。

https://www.dropbox.com/s/9zadqzbvcg6ogtf/000218.txt?dl=0 https://www.dropbox.com/s/9zadqzbvcg6ogtf/000218.txt?dl=0

AppearanceDate 29.08.2015
AppearanceTime 00:02:18
FrameCount 17
#   time  bright   x      y      alpha     delta   c_x    c_y    c_alpha  c_delta  use
01  18.175 ----  0.052  0.838   19.3755   21.947  -----  -----  --------  -------  no
02  18.215 ----  0.053  0.834   19.3682   21.985  -----  -----  --------  -------  no
03  18.255 ----  0.055  0.830   19.3608   22.024  -----  -----  --------  -------  no
04  18.295  5.1  0.057  0.826   19.3535   22.063  -----  -----   19.3541   22.070  yes
05  18.335  0.4  0.058  0.821   19.3462   22.101  -----  -----   19.3452   22.105  yes
06  18.375  0.3  0.060  0.815   19.3354   22.137  -----  -----   19.3365   22.140  yes
07  18.415  0.3  0.061  0.811   19.3281   22.172  -----  -----   19.3278   22.174  yes
08  18.455  0.2  0.063  0.806   19.3193   22.210  -----  -----   19.3192   22.208  yes
09  18.495  0.2  0.064  0.801   19.3110   22.236  -----  -----   19.3107   22.241  yes
10  18.535  0.2  0.066  0.795   19.3018   22.286  -----  -----   19.3023   22.274  yes
11  18.575  0.1  0.068  0.791   19.2935   22.312  -----  -----   19.2939   22.306  yes
12  18.615 ----  0.069  0.786   19.2861   22.335  -----  -----  --------  -------  no
13  18.655 -0.1  0.070  0.782   19.2788   22.359  -----  -----   19.2776   22.369  yes
14  18.695 -0.1  0.071  0.776   19.2686   22.391  -----  -----   19.2695   22.400  yes
15  18.735 ----  0.073  0.770   19.2583   22.424  -----  -----  --------  -------  no
16  18.775 ----  0.074  0.764   19.2480   22.456  -----  -----  --------  -------  no
17  18.815 ----  0.076  0.758   19.2383   22.488  -----  -----  --------  -------  no

I would like to match both the HH:MM:SS from AppearanceTime, and the SS.sss, from under the "Time" column. 我想同时匹配AppearanceTime中的HH:MM:SS和“时间”列中的SS.sss。

Currently I can almost do it in two steps - firstly for AppearanceTime I can use: 目前,我几乎可以分两个步骤进行操作-首先可以使用AppearanceTime:

r"(\\d{2}:\\d{2}:\\d{2})"

As far as I've got with the SS.sss values is: 据我对SS.sss值的了解是:

r"(\\d{2}[.]\\d{3})"

but this matches part of the values in AppearanceDate, alpha, delta, c_alpha and c_delta also. 但这与AppearanceDate,alpha,delta,c_alpha和c_delta中的部分值匹配。

Finally, just in case it matters - I've been testing here: https://regex101.com/ with the global and multiline flags on. 最后,以防万一,请在这里进行测试: https : //regex101.com/并启用了全局和多行标志。

If anyone could help me out with this it would be most appreciated. 如果有人可以帮助我,将不胜感激。 There seem to be a load of good resources to help with regex creation but I am getting absolutely nowhere with it! 似乎有大量的资源可以帮助创建正则表达式,但是我绝对无法做到这一点!

Another idea I had was that I could probably use split(' ') quite effectively for the SS.sss but I wanted to ask whether anyone has an idea of which of regex or split is more efficient as this will be applied to many thousands of files like the one given above. 我的另一个想法是,我可能可以对SS.sss非常有效地使用split(' ') ,但是我想问一问,是否有人对正则表达式或split哪个更有效,因为这将适用于成千上万个像上面给出的文件。

Thanks a lot! 非常感谢!

You may use 您可以使用

(?:AppearanceTime\s+|^\d+\s+)(\d{2}:\d{2}:\d{2}|\d{2}\.\d{3})

See the regex demo (use the re.M flag with re.findall ). 参见regex演示 (将re.M标志与re.findall一起re.findall )。

Details : 详细资料

  • (?:AppearanceTime\\s+|^\\d+\\s+) - this matches 2 alternatives (?:AppearanceTime\\s+|^\\d+\\s+) -匹配2种选择
    • AppearanceTime\\s+ - AppearanceTime strings and 1+ whitespaces ( \\s+ ) AppearanceTime\\s+ AppearanceTime字符串和1+空格( \\s+
    • | - or - 要么
    • ^\\d+\\s+ - start of a line ( ^ ), 1+ digits ( \\d+ ) and 1+ whitespaces ^\\d+\\s+ -行首( ^ ),1 +个数字( \\d+ )和1+空格
  • (\\d{2}:\\d{2}:\\d{2}|\\d{2}\\.\\d{3}) - matches and captures (the final output for re.findall ) either of the 2 alternatives: (\\d{2}:\\d{2}:\\d{2}|\\d{2}\\.\\d{3}) -匹配并捕获re.findall的最终输出)2个替代方法之一:
    • \\d{2}:\\d{2}:\\d{2} - 3 : -separated 2-digit chunks \\d{2}:\\d{2}:\\d{2} -3 :分隔的2位数块
    • | - or - 要么
    • \\d{2}\\.\\d{3} - 2 digits, . \\d{2}\\.\\d{3} -2位数字, . , 3 digits substring ,3位数子串

See the Python demo : 参见Python演示

import re
rx = r"(?:AppearanceTime\s+|^\d+\s+)(\d{2}:\d{2}:\d{2}|\d{2}\.\d{3})"
s = <<YOUR STRING HERE>>
res = re.findall(rx, s, flags=re.MULTILINE)
print(res)
match = re.findall(r'^\d.+?(\d{2}[.]\d{3})', txt, flags=re.MULTILINE)
print(match)

out: 出:

['18.175', '18.215', '18.255', '18.295', '18.335', '18.375', '18.415', '18.455', '18.495', '18.535', '18.575', '18.615', '18.655', '18.695', '18.735', '18.775', '18.815']

just use multiline mode, each line match first appear by ^\\d^\\d.+? 只需使用多行模式,每行匹配项都首先显示^\\d^\\d.+? .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM