简体   繁体   English

根据索引动态创建列表列表-Python

[英]Dynamically create list of list based on index - Python

I am trying to make a giant result list with lists of list based on index. 我正在尝试使用基于索引的列表列表制作一个巨大的结果列表。 I can't predefined how many lists will be inside of the giant list. 我无法预定义巨型列表中将包含多少个列表。

id   value
1     30
1     21
1     12
1     0
2     1
2     9
2     14
3     12
3     2
4     3
5     1

result = []
for id, dfs in df.groupby('id'):
    ....

    for i, row in dfs.iterrows():
        x = helper(row[value])
        # If the list is found, append the element
        if (result[i]):
            result[i].append(x)
        # Dynamically make lists base on index
        else:
            result[i] = []

If the list already defined, then just append the value x in the list. 如果列表已经定义,则只需在列表中附加值x。

Expected Output: 预期产量:

    first index      second index  third index   fourth index
[[x1,x5,x10,x11,x14], [x2,x4,x9], [x3,x7],       [x20]]

x values are compute by the helper function x值由辅助函数计算

It's unclear to me if you want the result as a dataframe or dict with the 'index' as keys or just as a list with the items in the right order. 对于我来说还不清楚,您是否希望将结果作为数据框或以'index'为键的字典或以正确顺序排列的项目列表。 Btw, Python lists start with index 0 . 顺便说一句,Python列表以索引0开头。

In [706]: result = collections.defaultdict(list)
     ...: for id, dfs in df.groupby('id'):
     ...:     result[id].extend(list(dfs['value'].values))
     ...:

In [707]: result  # this will be a dict
Out[707]:
defaultdict(list,
            {1: [30, 21, 12, 0], 2: [1, 9, 14], 3: [12, 2], 4: [3], 5: [1]})

In [708]: [result[k] for k in sorted(result.keys())]  # turn it into a list
Out[708]: [[30, 21, 12, 0], [1, 9, 14], [12, 2], [3], [1]]

If you want to apply some operation to each item in the group, like you're doing with helper() , you can do: 如果要对组中的每个项目应用某些操作,例如使用helper() ,则可以执行以下操作:

In [714]: def helper(val):
     ...:     return 'x' + str(val)  # simplifying whatever helper does

In [715]: result = collections.defaultdict(list)
     ...: for id, dfs in df.groupby('id'):
     ...:     result[id].extend(map(helper, dfs['value'].values))  # pass each value to helper

In [716]: result
Out[716]:
defaultdict(list,
            {1: ['x30', 'x21', 'x12', 'x0'],
             2: ['x1', 'x9', 'x14'],
             3: ['x12', 'x2'],
             4: ['x3'],
             5: ['x1']})

In [717]: [result[k] for k in sorted(result.keys())]
Out[717]:
[['x30', 'x21', 'x12', 'x0'],
 ['x1', 'x9', 'x14'],
 ['x12', 'x2'],
 ['x3'],
 ['x1']]

Note that result[id].extend(...) is not actually needed since each group of values for that 'id' will be passed in together. 请注意,实际上不需要result[id].extend(...) ,因为该'id'每组值都将一起传递。 So you don't need to check if that id already exists in result. 因此,您无需检查结果中是否已存在该id It could have been just: 可能只是:

In [720]: result = collections.defaultdict(list)
     ...: for id, dfs in df.groupby('id'):
     ...:     result[id] = list(map(helper, dfs['value'].values))

Ideally, you'd want to create helper so that it can be used with pd.apply() , by operating on all the dfs rows together. 理想情况下,您希望创建一个helper以便可以通过对所有dfs行一起操作来将其与pd.apply()一起使用。

Or even better, build helper so that it can do something with the the dataframe of each groupby result, via pd.groupby.GroupBy.apply() . 甚至更好的是,构建helper以便它可以通过pd.groupby.GroupBy.apply()对每个groupby结果的数据帧进行pd.groupby.GroupBy.apply()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM