[英]regex or split(' ')?
I am completely at a lost with how to appropriately construct a regular expression to do what I want with this file. 我完全不知道如何适当地构造一个正则表达式来执行此文件的操作。
https://www.dropbox.com/s/9zadqzbvcg6ogtf/000218.txt?dl=0 https://www.dropbox.com/s/9zadqzbvcg6ogtf/000218.txt?dl=0
AppearanceDate 29.08.2015
AppearanceTime 00:02:18
FrameCount 17
# time bright x y alpha delta c_x c_y c_alpha c_delta use
01 18.175 ---- 0.052 0.838 19.3755 21.947 ----- ----- -------- ------- no
02 18.215 ---- 0.053 0.834 19.3682 21.985 ----- ----- -------- ------- no
03 18.255 ---- 0.055 0.830 19.3608 22.024 ----- ----- -------- ------- no
04 18.295 5.1 0.057 0.826 19.3535 22.063 ----- ----- 19.3541 22.070 yes
05 18.335 0.4 0.058 0.821 19.3462 22.101 ----- ----- 19.3452 22.105 yes
06 18.375 0.3 0.060 0.815 19.3354 22.137 ----- ----- 19.3365 22.140 yes
07 18.415 0.3 0.061 0.811 19.3281 22.172 ----- ----- 19.3278 22.174 yes
08 18.455 0.2 0.063 0.806 19.3193 22.210 ----- ----- 19.3192 22.208 yes
09 18.495 0.2 0.064 0.801 19.3110 22.236 ----- ----- 19.3107 22.241 yes
10 18.535 0.2 0.066 0.795 19.3018 22.286 ----- ----- 19.3023 22.274 yes
11 18.575 0.1 0.068 0.791 19.2935 22.312 ----- ----- 19.2939 22.306 yes
12 18.615 ---- 0.069 0.786 19.2861 22.335 ----- ----- -------- ------- no
13 18.655 -0.1 0.070 0.782 19.2788 22.359 ----- ----- 19.2776 22.369 yes
14 18.695 -0.1 0.071 0.776 19.2686 22.391 ----- ----- 19.2695 22.400 yes
15 18.735 ---- 0.073 0.770 19.2583 22.424 ----- ----- -------- ------- no
16 18.775 ---- 0.074 0.764 19.2480 22.456 ----- ----- -------- ------- no
17 18.815 ---- 0.076 0.758 19.2383 22.488 ----- ----- -------- ------- no
I would like to match both the HH:MM:SS from AppearanceTime, and the SS.sss, from under the "Time" column. 我想同时匹配AppearanceTime中的HH:MM:SS和“时间”列中的SS.sss。
Currently I can almost do it in two steps - firstly for AppearanceTime I can use: 目前,我几乎可以分两个步骤进行操作-首先可以使用AppearanceTime:
r"(\\d{2}:\\d{2}:\\d{2})"
As far as I've got with the SS.sss values is: 据我对SS.sss值的了解是:
r"(\\d{2}[.]\\d{3})"
but this matches part of the values in AppearanceDate, alpha, delta, c_alpha and c_delta also. 但这与AppearanceDate,alpha,delta,c_alpha和c_delta中的部分值匹配。
Finally, just in case it matters - I've been testing here: https://regex101.com/ with the global and multiline flags on. 最后,以防万一,请在这里进行测试: https : //regex101.com/并启用了全局和多行标志。
If anyone could help me out with this it would be most appreciated. 如果有人可以帮助我,将不胜感激。 There seem to be a load of good resources to help with regex creation but I am getting absolutely nowhere with it!
似乎有大量的资源可以帮助创建正则表达式,但是我绝对无法做到这一点!
Another idea I had was that I could probably use split(' ')
quite effectively for the SS.sss but I wanted to ask whether anyone has an idea of which of regex or split is more efficient as this will be applied to many thousands of files like the one given above. 我的另一个想法是,我可能可以对SS.sss非常有效地使用
split(' ')
,但是我想问一问,是否有人对正则表达式或split哪个更有效,因为这将适用于成千上万个像上面给出的文件。
Thanks a lot! 非常感谢!
You may use 您可以使用
(?:AppearanceTime\s+|^\d+\s+)(\d{2}:\d{2}:\d{2}|\d{2}\.\d{3})
See the regex demo (use the re.M
flag with re.findall
). 参见regex演示 (将
re.M
标志与re.findall
一起re.findall
)。
Details : 详细资料 :
(?:AppearanceTime\\s+|^\\d+\\s+)
- this matches 2 alternatives (?:AppearanceTime\\s+|^\\d+\\s+)
-匹配2种选择
AppearanceTime\\s+
- AppearanceTime
strings and 1+ whitespaces ( \\s+
) AppearanceTime\\s+
AppearanceTime
字符串和1+空格( \\s+
) |
- or ^\\d+\\s+
- start of a line ( ^
), 1+ digits ( \\d+
) and 1+ whitespaces ^\\d+\\s+
-行首( ^
),1 +个数字( \\d+
)和1+空格 (\\d{2}:\\d{2}:\\d{2}|\\d{2}\\.\\d{3})
- matches and captures (the final output for re.findall
) either of the 2 alternatives: (\\d{2}:\\d{2}:\\d{2}|\\d{2}\\.\\d{3})
-匹配并捕获 ( re.findall
的最终输出)2个替代方法之一:
\\d{2}:\\d{2}:\\d{2}
- 3 :
-separated 2-digit chunks \\d{2}:\\d{2}:\\d{2}
-3 :
分隔的2位数块 |
- or \\d{2}\\.\\d{3}
- 2 digits, .
\\d{2}\\.\\d{3}
-2位数字, .
, 3 digits substring See the Python demo : 参见Python演示 :
import re
rx = r"(?:AppearanceTime\s+|^\d+\s+)(\d{2}:\d{2}:\d{2}|\d{2}\.\d{3})"
s = <<YOUR STRING HERE>>
res = re.findall(rx, s, flags=re.MULTILINE)
print(res)
match = re.findall(r'^\d.+?(\d{2}[.]\d{3})', txt, flags=re.MULTILINE)
print(match)
out: 出:
['18.175', '18.215', '18.255', '18.295', '18.335', '18.375', '18.415', '18.455', '18.495', '18.535', '18.575', '18.615', '18.655', '18.695', '18.735', '18.775', '18.815']
just use multiline mode, each line match first appear by ^\\d^\\d.+?
只需使用多行模式,每行匹配项都首先显示
^\\d^\\d.+?
. 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.