简体   繁体   中英

Compact way using generators/“with … as” in Python

I have the following data structure:

var = [['x_A_B', 1], ['x_A_C', 1], ['x_B_A', 1], ['x_B_D', 1], ['x_C_A', 1], ['x_C_D', 1], ['x_D_B', 1], ['x_D_C', 1]]

I would like to extract these values as

var2 = [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'D'), ('C', 'A'), ('C', 'D'), ('D', 'B'), ('D', 'C')]

Currently I use the following line

var2 = [(item[0].split("_")[1], item[0].split("_")[2]) for item in var]

but it's tedious to write, and also calculates the same split two times. Is there a way of writing this in a compact way, maybe with keywords with ... as , something like this?

# not working
var2 = [(u, v) with item[0].split("_") as _, u, v for item in var]

EDIT: I was looking for a more general solution, where I can use arbitrary indices of the split string with arbitrary length of substrings, I just used an improper example. See the solution I accepted.

The general case would be:

[tuple(item[0].split('_')[1:3]) for item in var]

And the most general case would be:

indices = {1,2}
[tuple([x for i, x in enumerate(item[0].split('_')) if i in indices]) for item in var]

But if you have two indices that are one next to another this would be too much.

Why even use split ? You know the exact indices of the letters you want.

>>> var = [['x_A_B', 1], ['x_A_C', 1], ['x_B_A', 1], ['x_B_D', 1], ['x_C_A', 1], ['x_C_D', 1], ['x_D_B', 1], ['x_D_C', 1]]
>>> [(x[0][2], x[0][4]) for x in var]
[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'D'), ('C', 'A'), ('C', 'D'), ('D', 'B'), ('D', 'C')]

I am interested in a more general case, suppose there can be 'x_word1_word2' variable names.

Well in that case internet_user gave you the solution in the comments.

>>> var = [['x_A_B', 1], ['x_word1_word2']]
>>> [tuple(x[0].rsplit('_', 2)[1:]) for x in var]
[('A', 'B'), ('word1', 'word2')]

(I used rsplit constrained to two splits for a very minor efficiency improvement.)

The other answers already talk about your specific case. In the more general case, if you're observing that the same value appears multiple times in a comprehension...

var2 = [(item[0].split("_")[1], item[0].split("_")[2]) for item in var]
        ^                       ^

and you'd like to avoid this repetition. Is that about right?

One way is to use a nested loop, but that's really a code golfing trick...

[(parts[1], parts[2] for item in var for parts in [item[0].split("_")]]
# or 
[(a, b) for item in var for (_, a, b) in [item[0].split("_")]]

but yeah, that wouldn't pass code review...

How about writing a function instead?

def extract_parts(item):
    parts = item[0].split("_")
    return parts[1], parts[2]

[extract_parts(item) for item in var]
# or:
map(extract_parts, var)

To answer your question with a similar approach to your example, and including your comment :

Yes that works in this case, @internet_user also suggested this. But what if the indices I need are not consecutive, ie I need 0 and 2?

The with...as... syntax is for context managers, which has a totally different use. However, a work-around is to use for-loop unpacking.

var = [['x_A_B', 1], ['x_A_C', 1], ['x_B_A', 1], ['x_B_D', 1], ['x_C_A', 1], ['x_C_D', 1], ['x_D_B', 1], ['x_D_C', 1]]

var2 = [(u, v) for item in var for _, u, v in (item[0].split("_"), )]

print(var2)

You can use:

[tuple(x[0].split('_')[1:]) for x in var]

out: [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'D'), ('C', 'A'), ('C', 'D'), ('D', 'B'), ('D', 'C')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM