简体   繁体   English

我该如何分割这个字符串?

[英]How can I split this string up?

I'm trying to split strings up preceding the places where there is a whole, 2 digit number surrounded by whitespace. 我试图在有空白包围的整数2位数的地方之前分割字符串。 Eventually I'd like this to work in Python, but I've been workbenching with sed and I can't figure it out. 最终我希望这可以在Python中工作,但我已经与sed工作了,我无法理解。

My test data looks like this: 我的测试数据如下所示:

13 13 13 13 13 9:07.18 9:12.09 9:15.65
14 14 14 2:04.86 2:05.99 2:06.87 14 4:21.51 4:23.51 4:25.00 14 8:56.28 9:01.09 9:04.58
15 15 57.18 57.61 57.95 15 2:02.61 2:03.72 2:04.58 15 4:17.31 4:19.28 4:20.75 15 8:47.15 8:51.87 8:55.30
16 16 56.34 56.76 57.09 16 2:00.69 2:01.78 2:02.63 16 4:13.75 4:15.69 4:17.14 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99 17 55.62 56.03 56.36 17 1:59.07 2:00.15 2:00.99 17 4:10.76 4:12.69 4:14.11 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73 18 55.01 55.42 55.74 18 1:57.74 1:58.81 1:59.63 18 4:08.34 4:10.24 4:11.66 18 8:33.73 8:37.04
19 25.20 25.36 25.49 19 54.50 54.91 55.23 19 1:57.74 1:58.56 19 4:08.34 4:09.74 19 8:33.73

And I would like it to be split up like this ( note the location of the commas ',' ): 我希望它像这样分开( 注意逗号的位置',' ):

13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

The data above, is fairly regular in that the two digit whole numbers are in the range [13,19], but the range I should expect is [10,99]. 上面的数据是相当规律的,因为两位数的整数在[13,19]范围内,但我应该期望的范围是[10,99]。

Could somebody suggest a method to perform the above transformation? 有人可以建议一种方法来执行上述转换吗? I've been at this with regex for a while now but I can't cover all the cases. 我已经使用正则表达式了一段时间,但我不能涵盖所有情况。

The look-ahead assertion (?=...) can solve this: 先行断言(?=...)可以解决这个问题:

>>> a = """13 13 13 13 13 9:07.18 9:12.09 9:15.65
14 14 14 2:04.86 2:05.99 2:06.87 14 4:21.51 4:23.51 4:25.00 14 8:56.28 9:01.09 9:04.58
15 15 57.18 57.61 57.95 15 2:02.61 2:03.72 2:04.58 15 4:17.31 4:19.28 4:20.75 15 8:47.15 8:51.87 8:55.30
16 16 56.34 56.76 57.09 16 2:00.69 2:01.78 2:02.63 16 4:13.75 4:15.69 4:17.14 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99 17 55.62 56.03 56.36 17 1:59.07 2:00.15 2:00.99 17 4:10.76 4:12.69 4:14.11 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73 18 55.01 55.42 55.74 18 1:57.74 1:58.81 1:59.63 18 4:08.34 4:10.24 4:11.66 18 8:33.73 8:37.04
19 25.20 25.36 25.49 19 54.50 54.91 55.23 19 1:57.74 1:58.56 19 4:08.34 4:09.74 19 8:33.73"""

>>> print(re.sub("(\d{2}) (?=\d{2}( |$))","\g<1>, ", a))
13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

So, the reg exp. 所以,reg exp。 you need is (\\d{2}) (?=\\d{2}( |$)) which means: 你需要的是(\\d{2}) (?=\\d{2}( |$))这意味着:

  1. (\\d{2}) => Store 2 numbers in group 1 and match an extra space. (\\d{2}) =>在组1中存储2个数字并匹配额外的空格。
  2. (?=\\d{2}( |$)) => match 2 numbers and 1 space or EOL, but don't consume them. (?=\\d{2}( |$)) =>匹配2个数字和1个空格或EOL,但不要使用它们。

The key here is that by not consuming the second matched group, it will be processed again next time that the sub function is applied. 这里的关键是,通过不消耗第二个匹配的组,下次应用子功能时将再次处理它。 Finally, \\g<1>, will replace 1. with the same numbers and the additional , . 最后, \\g<1>,将取代1.用相同的数字和附加的,

For the fun of sed and because you seem interested in an sed reference for understanding. 为了sed的乐趣,因为你似乎对sed参考有兴趣了解。

sed ":a;s/\([^,]\)\(\s[0-9]\{2\}\s\)/\1,\2/;ta"

or 要么

sed -E ":a;s/([^,])(\s[0-9]{2}\s)/\1,\2/;ta"
  • start a loop 开始循环
    • look for 寻找
      • something other than , , important for looping later 东西比其他的,为以后的循环很重要
      • a whitespace, two digits and a whitespace 一个空格,两个数字和一个空格
    • replace by the non-comma, a comma and the rest 替换为非逗号,逗号和其余部分
  • loop if that replaced something 循环,如果它取代了什么

Output (exactly as desired output): 输出(完全符合要求的输出):

13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

Adding to VMRuiz's answer , this outputs a list for each line, instead of one big string. 添加到VMRuiz的答案 ,这将输出每行的列表,而不是一个大字符串。 I had to change the regex in order to use re.split instead of re.sub , and I'm not sure that it's equivalent. 我必须更改正则表达式才能使用re.split而不是re.sub ,我不确定它是否相同。

for line in a.split('\n'):
    re.split('(?<=\d{2}) (?=\d{2} |$)', line)

Edit: This is definitely the same, but a bit awkward: 编辑:这绝对是一样的,但有点尴尬:

for line in re.sub('(\d{2}) (?=\d{2}( |$))', '\g<1>,', a).split('\n'):
    line.split(',')

If you want a non regex Python solution, you can do: 如果您需要非正则表达式Python解决方案,您可以:

s = """\
13 13 13 13 13 9:07.18 9:12.09 9:15.65
14 14 14 2:04.86 2:05.99 2:06.87 14 4:21.51 4:23.51 4:25.00 14 8:56.28 9:01.09 9:04.58
15 15 57.18 57.61 57.95 15 2:02.61 2:03.72 2:04.58 15 4:17.31 4:19.28 4:20.75 15 8:47.15 8:51.87 8:55.30
16 16 56.34 56.76 57.09 16 2:00.69 2:01.78 2:02.63 16 4:13.75 4:15.69 4:17.14 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99 17 55.62 56.03 56.36 17 1:59.07 2:00.15 2:00.99 17 4:10.76 4:12.69 4:14.11 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73 18 55.01 55.42 55.74 18 1:57.74 1:58.81 1:59.63 18 4:08.34 4:10.24 4:11.66 18 8:33.73 8:37.04
19 25.20 25.36 25.49 19 54.50 54.91 55.23 19 1:57.74 1:58.56 19 4:08.34 4:09.74 19 8:33.73"""


res=""
for line in s.splitlines():
    buf=line.split()
    for i, e in enumerate(buf[1:], 1):
        buf[i-1]+=", " if e.isdigit() else " "
    res+=''.join(buf)+"\n"  

>>> res
13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

In awk you can do: awk你可以这样做:

awk '{n=split($0,a)
      for (i=2;i<=n;i++)
          printf "%s%s", a[i-1], a[i]~/^[[:digit:]]+$/ ?  ", " : " "
      print a[n]
    }' file
13, 13, 13, 13, 13 9:07.18 9:12.09 9:15.65
14, 14, 14 2:04.86 2:05.99 2:06.87, 14 4:21.51 4:23.51 4:25.00, 14 8:56.28 9:01.09 9:04.58
15, 15 57.18 57.61 57.95, 15 2:02.61 2:03.72 2:04.58, 15 4:17.31 4:19.28 4:20.75, 15 8:47.15 8:51.87 8:55.30
16, 16 56.34 56.76 57.09, 16 2:00.69 2:01.78 2:02.63, 16 4:13.75 4:15.69 4:17.14, 16 8:39.71 8:44.37 8:47.75
17 25.69 25.85 25.99, 17 55.62 56.03 56.36, 17 1:59.07 2:00.15 2:00.99, 17 4:10.76 4:12.69 4:14.11, 17 8:33.73 8:38.34 8:41.68
18 25.43 25.59 25.73, 18 55.01 55.42 55.74, 18 1:57.74 1:58.81 1:59.63, 18 4:08.34 4:10.24 4:11.66, 18 8:33.73 8:37.04
19 25.20 25.36 25.49, 19 54.50 54.91 55.23, 19 1:57.74 1:58.56, 19 4:08.34 4:09.74, 19 8:33.73

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM