简体   繁体   中英

converting an array of pairs into a 2D array based on first column

is there a (preferably elegant) way in Python for taking an array of pairs such as

[[3,350],[4,800],[0,150],[0,200],[4,750]]

into something like

[
  [150,200],
  [],
  [],
  [350],
  [800,750]
]

?

In other words, what's a good method for putting the second number in every pair into an array, with its row index being determined by the first number in the pair?

Try taking a look at list comprehensions, they provide a one-liner way of creating lists. If you don't know what they are this is a pretty decent guide to get you started here . Also, take a look at tuple 's, as they are more appropriate for paired values, as opposed to lists. Note that tuples are not mutable, so you cannot make changes once you have created it.

Your list using tuples would look like this

foo = [(3,350),(4,800),(0,200),(4,750)]

As far as I'm aware, Python lists have no predefined size, rather they grow and shrink as changes are made. So, what you'll want to do, is find the largest index value in the list, or foo = [x[0] for x in list_of_pairs] would access the first index of every list inside of your main list, the one named list_of_pairs . Note that this strategy would work for the tuple based list as well.

The below should do what you want

list_of_pairs = [[3,350],[4,800],[0,200],[4,750]]
indexes = {x[0] for x in list_of_pairs}
new_list = []

for i in indexes:
    new_list.append([x[1] for x in list_of_pairs if x[0] == i])

As @thefourtheye noted dict might be better container. In case you want 2D list you could first add the values a intermediate dict where key is row and value is list of numbers. Then you could use list comprehension to generate the final result:

>>> l = [[3,350],[4,800],[0,150],[0,200],[4,750]]
>>> d = {}
>>> for row, num in l:
...     d.setdefault(row, []).append(num)
...
>>> [d.get(i, []) for i in range(max(d.keys()) + 1)]
[[150, 200], [], [], [350], [800, 750]] 

I would use pandas module for this task:

In [186]: a = np.array([[3,350],[4,800],[0,150],[0,200],[4,750]])

In [187]: res = pd.DataFrame(a).groupby(0)[1].apply(list).to_frame('val').rename_axis('idx')

In [188]: res
Out[188]:
            val
idx
0    [150, 200]
3         [350]
4    [800, 750]

Now you have an indexed data set and you can use it in the following way:

In [190]: res.ix[0, 'val']
Out[190]: [150, 200]

In [191]: res.ix[0, 'val'][1]
Out[191]: 200

In [192]: res.ix[4, 'val']
Out[192]: [800, 750]

PS i think you don't have to keep empty lists in the resulting data set - as it's a waste of resources

There are numerious ways to do this. Here's a fairly straight-forward one:

a = [[3, 350], [4, 800], [0, 150], [0, 200], [4, 750]]

rows, values = zip(*a)
b = [[] for _ in range(max(rows)+1)]  # initialize 2D output
for i, row in enumerate(rows):
    b[row].append(values[i])

print(b)  # -> [[150, 200], [], [], [350], [800, 750]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM