正则表达式python条形字符

Question

I have a text file: 我有一个文本文件：

z.server(y.host=>["x.012345","x.054321","x.045455"], :stop => 10)
z.server(y.host=>["x.067891","x.043215","x.045195"], :stop => 10)
z.server(y.host=>["x.012355","x.075321","x.045855"], :stop => 10)

I have a script which I want to extract the following data: 我有一个脚本要提取以下数据：

y.host 012345 012345 012345
y.host 067891 043215 045195
y.host 012355 075321 075321

When I run my python script I get: 当我运行python脚本时，我得到：

y.host 012345","x.054321","x.045455
y.host 067891","x.043215","x.045195
y.host 012355","x.075321","x.045855

What am I missing? 我想念什么？ Appreciate any help. 感谢任何帮助。

Here's my script: 这是我的脚本：

#!/usr/bin/python

import re,sys

f = "test.txt"

rgxxid = re.compile('(^z\.\w+\((\w+\.\w+)=>\["x\.(\d+.*)"\]).\s+:\w+\s+=>\s\d+\)')

for l in open(f,'r').readlines():

   lm = re.match(rgxxid,l)

   if lm:

      hlm = lm.group(2)
      xid = lm.group(3)

      print hlm, xid

   else:
      sys.stderr.write("No XID match. "+l+"\n")

Answer 1

In brief, here is the problem with your current regex: 简而言之，这是您当前正则表达式的问题：

["x\.(\d+.*)"\]
         ^^^

The pattern \\d+.* says to match one or more digits followed by anything up until the last quote. 模式\\d+.*表示要匹配一个或多个数字，然后匹配直到最后一个引号的所有数字。 What you are seeing in your output confirms this: 您在输出中看到的内容确认了这一点：

y.host 012345","x.054321","x.045455

The capture group has consumed everything until the end of the numbers list. 捕获组已经消耗了所有东西，直到数字列表的末尾。 Instead, try using the following pattern: 相反，请尝试使用以下模式：

^z\.\w+\((\w+\.\w+)=>\["x\.(\d+)","x\.(\d+)","x\.(\d+)"\],\s+:\w+\s+=>\s\d+\)

Here, I provide three explicit capture groups for each of the three numners. 在这里，我为三个数字中的每个数字提供了三个显式捕获组。 Here is a demo for the regex: 这是正则表达式的演示：

Demo 演示

Your updated script should look something like this: 您更新后的脚本应如下所示：

for l in open(f,'r').readlines():

lm = re.match(rgxxid,l)

if lm:

  term1 = lm.group(1)
  term2 = lm.group(2)
  term3 = lm.group(3)
  term4 = lm.group(4)

  print term1, term2, term3, term4

else:
  sys.stderr.write("No XID match. "+l+"\n")

Answer 2

You might find it easier to use pyparsing . 您可能会发现使用pyparsing更容易。 It definitely makes it simpler to capture the grammar of the lines that you have offered as examples. 无疑，它可以更轻松地捕获您作为示例提供的各行的语法。

Notice: 注意：

What appear to be server names are defined as mixtures of alphabetic characters and periods. 看起来是服务器名称的是定义为字母字符和句点的混合。 This could be expanded. 这可以扩大。
The list could contain and retrieve an indefinite number of constants. 该列表可以包含并检索不确定数量的常量。

. 。

import pyparsing as pp

server = pp.Word(pp.alphas+'.')
item = pp.Suppress('"x.') + pp.Word(pp.nums) + pp.Suppress('"')
one_line = server.suppress() + pp.Suppress('(') + server + pp.Suppress('=>[') + item + pp.OneOrMore(pp.Suppress(',') + item)

lines = '''\
z.server(y.host=>["x.012345","x.054321","x.045455"], :stop => 10)
z.server(y.host=>["x.067891","x.043215","x.045195"], :stop => 10)
z.server(y.host=>["x.012355","x.075321","x.045855"], :stop => 10)'''

for line in lines.split('\n'):
    print (line)
    parsed = one_line.parseString(line)
    print ('\t', parsed[:5])

Output: 输出：

z.server(y.host=>["x.012345","x.054321","x.045455"], :stop => 10)
     ['y.host', '012345', '054321', '045455']
z.server(y.host=>["x.067891","x.043215","x.045195"], :stop => 10)
     ['y.host', '067891', '043215', '045195']
z.server(y.host=>["x.012355","x.075321","x.045855"], :stop => 10)
     ['y.host', '012355', '075321', '045855']

正则表达式python条形字符

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-08-25 02:20:57

Demo 演示

解决方案2
2 2017-08-25 04:01:27

正则表达式python条形字符

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-08-25 02:20:57

Demo 演示

解决方案2 2 2017-08-25 04:01:27

解决方案1
2 已采纳 2017-08-25 02:20:57

解决方案2
2 2017-08-25 04:01:27