[英]Dynamically create list of list based on index - Python
I am trying to make a giant result list with lists of list based on index. 我正在尝试使用基于索引的列表列表制作一个巨大的结果列表。 I can't predefined how many lists will be inside of the giant list. 我无法预定义巨型列表中将包含多少个列表。
id value
1 30
1 21
1 12
1 0
2 1
2 9
2 14
3 12
3 2
4 3
5 1
result = []
for id, dfs in df.groupby('id'):
....
for i, row in dfs.iterrows():
x = helper(row[value])
# If the list is found, append the element
if (result[i]):
result[i].append(x)
# Dynamically make lists base on index
else:
result[i] = []
If the list already defined, then just append the value x in the list. 如果列表已经定义,则只需在列表中附加值x。
Expected Output: 预期产量:
first index second index third index fourth index
[[x1,x5,x10,x11,x14], [x2,x4,x9], [x3,x7], [x20]]
x values are compute by the helper function x值由辅助函数计算
It's unclear to me if you want the result as a dataframe or dict with the 'index' as keys or just as a list with the items in the right order. 对于我来说还不清楚,您是否希望将结果作为数据框或以'index'为键的字典或以正确顺序排列的项目列表。 Btw, Python lists start with index 0
. 顺便说一句,Python列表以索引0
开头。
In [706]: result = collections.defaultdict(list)
...: for id, dfs in df.groupby('id'):
...: result[id].extend(list(dfs['value'].values))
...:
In [707]: result # this will be a dict
Out[707]:
defaultdict(list,
{1: [30, 21, 12, 0], 2: [1, 9, 14], 3: [12, 2], 4: [3], 5: [1]})
In [708]: [result[k] for k in sorted(result.keys())] # turn it into a list
Out[708]: [[30, 21, 12, 0], [1, 9, 14], [12, 2], [3], [1]]
If you want to apply some operation to each item in the group, like you're doing with helper()
, you can do: 如果要对组中的每个项目应用某些操作,例如使用helper()
,则可以执行以下操作:
In [714]: def helper(val):
...: return 'x' + str(val) # simplifying whatever helper does
In [715]: result = collections.defaultdict(list)
...: for id, dfs in df.groupby('id'):
...: result[id].extend(map(helper, dfs['value'].values)) # pass each value to helper
In [716]: result
Out[716]:
defaultdict(list,
{1: ['x30', 'x21', 'x12', 'x0'],
2: ['x1', 'x9', 'x14'],
3: ['x12', 'x2'],
4: ['x3'],
5: ['x1']})
In [717]: [result[k] for k in sorted(result.keys())]
Out[717]:
[['x30', 'x21', 'x12', 'x0'],
['x1', 'x9', 'x14'],
['x12', 'x2'],
['x3'],
['x1']]
Note that result[id].extend(...)
is not actually needed since each group of values for that 'id'
will be passed in together. 请注意,实际上不需要result[id].extend(...)
,因为该'id'
每组值都将一起传递。 So you don't need to check if that id
already exists in result. 因此,您无需检查结果中是否已存在该id
。 It could have been just: 可能只是:
In [720]: result = collections.defaultdict(list)
...: for id, dfs in df.groupby('id'):
...: result[id] = list(map(helper, dfs['value'].values))
Ideally, you'd want to create helper
so that it can be used with pd.apply()
, by operating on all the dfs
rows together. 理想情况下,您希望创建一个helper
以便可以通过对所有dfs
行一起操作来将其与pd.apply()
一起使用。
Or even better, build helper
so that it can do something with the the dataframe of each groupby
result, via pd.groupby.GroupBy.apply()
. 甚至更好的是,构建helper
以便它可以通过pd.groupby.GroupBy.apply()
对每个groupby
结果的数据帧进行pd.groupby.GroupBy.apply()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.