简体   繁体   English

正则表达式python条形字符

[英]regex python strip characters

I have a text file: 我有一个文本文件:

z.server(y.host=>["x.012345","x.054321","x.045455"], :stop => 10)
z.server(y.host=>["x.067891","x.043215","x.045195"], :stop => 10)
z.server(y.host=>["x.012355","x.075321","x.045855"], :stop => 10)

I have a script which I want to extract the following data: 我有一个脚本要提取以下数据:

y.host 012345 012345 012345
y.host 067891 043215 045195
y.host 012355 075321 075321

When I run my python script I get: 当我运行python脚本时,我得到:

y.host 012345","x.054321","x.045455
y.host 067891","x.043215","x.045195
y.host 012355","x.075321","x.045855

What am I missing? 我想念什么? Appreciate any help. 感谢任何帮助。

Here's my script: 这是我的脚本:

#!/usr/bin/python

import re,sys

f = "test.txt"

rgxxid = re.compile('(^z\.\w+\((\w+\.\w+)=>\["x\.(\d+.*)"\]).\s+:\w+\s+=>\s\d+\)')

for l in open(f,'r').readlines():

   lm = re.match(rgxxid,l)

   if lm:

      hlm = lm.group(2)
      xid = lm.group(3)

      print hlm, xid

   else:
      sys.stderr.write("No XID match. "+l+"\n")

In brief, here is the problem with your current regex: 简而言之,这是您当前正则表达式的问题:

["x\.(\d+.*)"\]
         ^^^

The pattern \\d+.* says to match one or more digits followed by anything up until the last quote. 模式\\d+.*表示要匹配一个或多个数字, 然后匹配直到最后一个引号的所有数字。 What you are seeing in your output confirms this: 您在输出中看到的内容确认了这一点:

y.host 012345","x.054321","x.045455

The capture group has consumed everything until the end of the numbers list. 捕获组已经消耗了所有东西,直到数字列表的末尾。 Instead, try using the following pattern: 相反,请尝试使用以下模式:

^z\.\w+\((\w+\.\w+)=>\["x\.(\d+)","x\.(\d+)","x\.(\d+)"\],\s+:\w+\s+=>\s\d+\)

Here, I provide three explicit capture groups for each of the three numners. 在这里,我为三个数字中的每个数字提供了三个显式捕获组。 Here is a demo for the regex: 这是正则表达式的演示:

Demo 演示

Your updated script should look something like this: 您更新后的脚本应如下所示:

for l in open(f,'r').readlines():

lm = re.match(rgxxid,l)

if lm:

  term1 = lm.group(1)
  term2 = lm.group(2)
  term3 = lm.group(3)
  term4 = lm.group(4)

  print term1, term2, term3, term4

else:
  sys.stderr.write("No XID match. "+l+"\n")

You might find it easier to use pyparsing . 您可能会发现使用pyparsing更容易。 It definitely makes it simpler to capture the grammar of the lines that you have offered as examples. 无疑,它可以更轻松地捕获您作为示例提供的各行的语法。

Notice: 注意:

  • What appear to be server names are defined as mixtures of alphabetic characters and periods. 看起来是服务器名称的是定义为字母字符和句点的混合。 This could be expanded. 这可以扩大。
  • The list could contain and retrieve an indefinite number of constants. 该列表可以包含并检索不确定数量的常量。

.

import pyparsing as pp

server = pp.Word(pp.alphas+'.')
item = pp.Suppress('"x.') + pp.Word(pp.nums) + pp.Suppress('"')
one_line = server.suppress() + pp.Suppress('(') + server + pp.Suppress('=>[') + item + pp.OneOrMore(pp.Suppress(',') + item)

lines = '''\
z.server(y.host=>["x.012345","x.054321","x.045455"], :stop => 10)
z.server(y.host=>["x.067891","x.043215","x.045195"], :stop => 10)
z.server(y.host=>["x.012355","x.075321","x.045855"], :stop => 10)'''

for line in lines.split('\n'):
    print (line)
    parsed = one_line.parseString(line)
    print ('\t', parsed[:5])

Output: 输出:

z.server(y.host=>["x.012345","x.054321","x.045455"], :stop => 10)
     ['y.host', '012345', '054321', '045455']
z.server(y.host=>["x.067891","x.043215","x.045195"], :stop => 10)
     ['y.host', '067891', '043215', '045195']
z.server(y.host=>["x.012355","x.075321","x.045855"], :stop => 10)
     ['y.host', '012355', '075321', '045855']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM