简体   繁体   中英

Split line in data file

I'm struggling with splitting lines in my data file. Here is a few lines sample:

1:0 2:120
1:1 2:131
1:2 2:26
1:3 2:568
1:4 2:176
1:5 2:28 3:549
1:6 2:17
1:7 2:6 3:217 4:401 5:636
1:8 2:139

I want to split it to get out each value ... maybe in form of an array:

((1, 2) , (0, 120))
((1, 2) , (1, 131))
...
((1, 2, 3, 4, 5) , (7, 6, 217, 401, 636))

meaning that for each line the array could have different dimensions. I was trying to split it in two steps, but it doesn't work.

inf = open("datafile.txt", 'r')

for line in inf: 
line.split()
for x in line.split():
    x.split(':',1)

You can group the elements of two lists, using zip function.

with open("Input.txt") as inf:
    for line in inf:
        print zip(*map(lambda x: map(int, x.split(":")), line.split()))

Output

[(1, 2), (0, 120)]
[(1, 2), (1, 131)]
[(1, 2), (2, 26)]
[(1, 2), (3, 568)]
[(1, 2), (4, 176)]
[(1, 2, 3), (5, 28, 549)]
[(1, 2), (6, 17)]
[(1, 2, 3, 4, 5), (7, 6, 217, 401, 636)]
[(1, 2), (8, 139)]

Suggestion : It is always good to open the files with with keyword, like I have shown in the code above. Because, it will take care of closing/releasing the resources, even if the program fails with an exception.

Explanation:

Since zip is a function call, the parameters are evaluated first. Lets come to the * later. map(lambda x: map(int, x.split(":")), line.split()) , we apply the lambda function lambda x: map(int, x.split(":")) to each and every element of the list of strings returned by line.split() (which splits the sentences at whitespace characters and returns the list).

Each and every split word, will be passed as parameter to the lambda function one by one. If we take the first case, first "1:0" will be sent to the lambda function as x , where we split based on : which will give a list ["1", "0"] and then we apply int function over that, which will give [1, 0] . So, after all the lines are split and lambda is applied, the result will be like this

[[1, 0], [2, 120]]
[[1, 1], [2, 131]]
[[1, 2], [2, 26]]
[[1, 3], [2, 568]]
[[1, 4], [2, 176]]
[[1, 5], [2, 28], [3, 549]]
[[1, 6], [2, 17]]
[[1, 7], [2, 6], [3, 217], [4, 401], [5, 636]]
[[1, 8], [2, 139]]

Now we have two elements in each list. Remember the * which we decided to discuss later, it will unpack the list and pass all the elements as parameters to the zip function, like this

zip([1, 0], [2, 120])

Now zip will pick all the first elements and put them in a list, and then it will pick all the second elements and put them in a list and so on.

This is how we get the answer you expected.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM