简体   繁体   中英

How to split text in a data file with an even sequence? Python 3x

I'm having trouble splitting text in a data file such that suppose the the data file consisted of:

Row 1
apple
bob
cat
dog
ear
fun

Row 2
glow
horse
idea
joke
kick
lemon

Row 3
money
new
odd
park
queen
run

I want to split it so that it becomes a nested list like the following:

[[apple, bob], [cat, dog], [ear, fun]], 
[[glow, horse], [idea, joke], [kick, lemon]], 
[[money, new], [odd, park], [queen, run]]

This is my work so far:

def text_file(data_file):
    nested_list = []
    main_list = []
    my_list = ''
    for index in data_file:
        index = index.strip()

        if (index in my_list):
            main_list.append(nested_list)
            nested_list = []

        else:
            nested_list.append(index)

    if (nested_list):
        main_list.append(nested_list)

    return (main_list)

but this returns:

text_file(open("data_file.txt", "r"))
[['Row 1', 'apple', 'bob', 'cat', 'dog', 'ear', 'fun'], 
['Row 2', 'glow', 'horse', 'idea', 'joke', 'kick', 'lemon'], 
['Row 3', 'money', 'new', 'odd', 'park', 'queen', 'run']]

Without importing anything, how can I achieve this? If possible what can I add into my code?

What you need to do is split the file by \\n\\n (two newlines) which will give you the groups, then split the result of that by line, then use zip to step over the file appropriately to build your required lists, an eg:

s = """Row 1
apple
bob
cat
dog
ear
fun

Row 2
glow
horse
idea
joke
kick
lemon

Row 3
money
new
odd
park
queen
run"""

lines = s.split('\n\n')
for line in lines:
    words = line.splitlines()
    print([ [i, j] for i, j in zip(words[1::2], words[2::2]) ])

[['apple', 'bob'], ['cat', 'dog'], ['ear', 'fun']]
[['glow', 'horse'], ['idea', 'joke'], ['kick', 'lemon']]
[['money', 'new'], ['odd', 'park'], ['queen', 'run']]

something like this, using regex and iterators .

using regex split at Row number , and then you can either use zip or iterator to get the expected output.

In [8]: with open("data.txt") as f:
    spl=re.split(r"Row \d+",f.read())[1:]
    for x in spl:
        sp=x.split()
        it=iter(sp)
        print ([[next(it),next(it)] for _ in range(len(sp)//2)])
   ...:         
[['apple', 'bob'], ['cat', 'dog'], ['ear', 'fun']]
[['glow', 'horse'], ['idea', 'joke'], ['kick', 'lemon']]
[['money', 'new'], ['odd', 'park'], ['queen', 'run']]
if (nested_list):
    new_list = nested_list[1:]
    main_list.append(zip(new_list[::2], new_list[1::2]))

Try this out

The above code, instead of appending the nested list in the main list, first forms pairs of consecutive elements and then appends it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM