I'm trying to parse a unified diff file using pyparsing as an exercise and I can't get something right. Here the part of my diff file that's causing me troubles :
(... some stuff over...)
banana
+apple
orange
The first line starts with " " then "banana". I have the following expression for parsing a line :
linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine
This works when parsing a single line, but when I try to parse the whole file, the "leaveWhitespace" instruction make the parser start at the end of the last line. In my example, after parsing " banana", the next char is "\\n" (because of leaveWhitespace) and the parser tries to match " " or "+" or "-" and so throws an error.
How can I handle this correctly?
You can read and parse one line at a time. The following code works for me.
from pyparsing import Literal, restOfLine
linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine
f = open("/tmp/test.diff")
for l in f.readlines():
fields = line.parseString(l)
print fields
And the output is
[' ', 'banana']
['+', 'apple']
[' ', 'orange']
Or if you have to parse several lines, you can explicitly specify the LineEnd
linestart = Literal(" ") | Literal("+") | Literal("-")
line = linestart.leaveWhitespace() + restOfLine + LineEnd()
lines = ZeroOrMore(line)
lines.parseString(f.read())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.