简体   繁体   中英

Reading formatted multi-lines in Python

I would like to read some formatted data in python. The format of the data is something similar to this:

00:00:00
1 1 1
1 1 1
1 1 1

00:00:02
3 3 3
3 3 3
3 3 3

I could successfully simulate the reading in C/C++ using the forward code:

int main()
{
    string hour;
    int x0,y0,z0, x1,y1,z1, x2,y2,z2;

    while(cin >> hour)
    {
        scanf("%d %d %d\n%d %d %d\n%d %d %d\n", &x0, &y0, &z0, &x1, &y1, &z1, &x2, &y2, &z2);
        cout << hour << endl; //check the reading
    }
    return 0;
}

The problem is that i cannot find some Python's method that read a formatted multi-line string as simple as the scanf can. Some examples from np.genfromtxt got near to what a needed, as some from struct.unpack, but my skills wasn't enough to make it works right with the multi-lines. I could probably use split() with some readline to get exactly the formatted data, but it driving me nuts that a program in C/C++ would be simpler than on in Python. Is there any way to do something similar to the C/C++ code in Python?


Here is the answer after the Joril's help:

from scanf import sscanf
import sys

data = ''
for line in sys.stdin:
    if line != '\n':
        data += line
    else:
        print sscanf(data, "%s\n%d %d %d\n%d %d %d\n%d %d %d\n")
        data = ''

And as output i got something like:

('00:00:00', 1, 1, 1, 1, 1, 1, 1, 1, 1)
('00:00:02', 3, 3, 3, 3, 3, 3, 3, 3, 3)

You can definitely use regular expressions. Here is more or less matching code in python without loop: import re

hour = input()
res = re.match(
    r'(?P<hour>\d\d):(?P<minute>\d\d):(?P<second>\d\d)\n'  # \n'
    r'(?P<x0>\d+) (?P<y0>\d+) (?P<z0>\d+)\n'
    r'(?P<x1>\d+) (?P<y1>\d+) (?P<z1>\d+)\n'
    r'(?P<x2>\d+) (?P<y2>\d+) (?P<z2>\d+)',
    hour, re.MULTILINE)

if res:
    print(res.groupdict())

I would split the data into lines first and then parse though.

Well the Python FAQ says:

Is there a scanf() or sscanf() equivalent?

Not as such.

For simple input parsing, the easiest approach is usually to split the line into whitespace-delimited words using the split() method of string objects and then convert decimal strings to numeric values using int() or float(). split() supports an optional “sep” parameter which is useful if the line uses something other than whitespace as a separator.

For more complicated input parsing, regular expressions are more powerful than C's sscanf() and better suited for the task.

But it looks like someone made a module that does exactly what you want:
https://hkn.eecs.berkeley.edu/~dyoo/python/scanf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM