简体   繁体   中英

Matching a space at the beginning of a line using pyparsing

I'm trying to parse a unified diff file using pyparsing as an exercise and I can't get something right. Here the part of my diff file that's causing me troubles :

(... some stuff over...)
 banana
+apple
 orange

The first line starts with " " then "banana". I have the following expression for parsing a line :

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine

This works when parsing a single line, but when I try to parse the whole file, the "leaveWhitespace" instruction make the parser start at the end of the last line. In my example, after parsing " banana", the next char is "\\n" (because of leaveWhitespace) and the parser tries to match " " or "+" or "-" and so throws an error.

How can I handle this correctly?

You can read and parse one line at a time. The following code works for me.

from pyparsing import Literal, restOfLine

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine

f = open("/tmp/test.diff")
for l in f.readlines():
  fields = line.parseString(l)
  print fields

And the output is

[' ', 'banana']
['+', 'apple']
[' ', 'orange']

Or if you have to parse several lines, you can explicitly specify the LineEnd

linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine + LineEnd()
lines = ZeroOrMore(line)
lines.parseString(f.read())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM