[英]How do I apply transformations to list of pandas dataframes?
I have a bunch of pandas DataFrames belonging to a handful of logical groupings, but all of which have some overlapping columns. 我有一堆pandas DataFrames属于少数逻辑分组,但所有这些都有一些重叠的列。 and it would save a lot of time if I could apply a list of functions (like the one in funcs
below) to a whole list of DataFrames. 它可以节省大量的时间,如果我能申请的功能列表(像在funcs
下文)DataFrames的整个列表。
# Make example DataFrames
df_a = pd.DataFrame({'col_a': [1, 1, 2], 'col_b': [1, 1, 2], 'col_c': [1, 1, 2],
'col_d': [1, 2, 3], 'col_e': [1, 2, 3], 'col_f': [1, 2, 3],
'foo': 'foo', 'bar': 'bar', 'baz': 'baz'})
df_b = pd.DataFrame({'col_a': [4, 5, 5], 'col_b': [4, 5, 5], 'col_c': [4, 5, 5],
'col_d': [4, 5, 6], 'col_e': [4, 5, 6], 'col_f': [4, 5, 6],
'foo': 'foo', 'bar': 'bar', 'baz': 'baz'})
df_c = pd.DataFrame({'col_a': [7, 7, 7], 'col_b': [7, 7, 7], 'col_c': [7, 7, 7],
'col_d': [7, 8, 9], 'col_e': [7, 8, 9], 'col_f': [7, 8, 9],
'foo': 'foo', 'bar': 'bar', 'baz': 'baz'})
# Make list of a bunch of DataFrames
data_sets_a = [df_a, df_b, df_c]
# Drop some columns (this works as expected on each DataFrame)
[d.drop(['foo', 'bar', 'baz'], axis=1, inplace=True) for d in data_sets_a]
# List of functions to apply to overlapping DataFrame columns
funcs = {'col_d': 'count', 'col_e': 'min', 'col_f': 'sum'}
# Group by and aggregate with funcs dict (does not work)
[d.groupby(['col_a', 'col_b', 'col_c']).agg(funcs, inplace=True).reset_index() for d in data_sets_a]
data_sets_a
Using drop
with inplace=True
over a list of DataFrames in a list comprehension works as I expected, but it doesn't work with groupby
and agg
--the DataFrames in the list remain unchanged. 使用drop
with inplace=True
在列表中的DataFrame列表中,按照我的预期工作,但它不适用于groupby
和agg
- 列表中的DataFrames保持不变。
[ col_a col_b col_c col_d col_e col_f
0 1 1 1 1 1 1
1 1 1 1 2 2 2
2 2 2 2 3 3 3,
col_a col_b col_c col_d col_e col_f
0 4 4 4 4 4 4
1 5 5 5 5 5 5
2 5 5 5 6 6 6,
col_a col_b col_c col_d col_e col_f
0 7 7 7 7 7 7
1 7 7 7 8 8 8
2 7 7 7 9 9 9]
Changing the inplace=True
value for drop
does what I'd expect, but it doesn't seem to make a difference with groupby
and agg
. 改变inplace=True
的价值drop
不,我期望什么,但它似乎并没有做出有区别groupby
和agg
。
Can someone explain why the two list comprehensions have different results, or show me a better way to get the results I'm looking for? 有人可以解释为什么两个列表推导有不同的结果,或者告诉我一个更好的方法来获得我正在寻找的结果?
Is it a problem with the code mapping the functions to the DataFrame list? 将函数映射到DataFrame列表的代码是一个问题吗?
I've been reading pandas' documentation and Googling for a while now and tried various things like query
, map
, lambda
combinations, but to no avail. 我一直在阅读熊猫的文档和谷歌搜索一段时间,并尝试各种事情,如query
, map
, lambda
组合,但无济于事。
for i in range(len(data_sets_a)):
cols = ['col_a', 'col_b', 'col_c']
gb = data_sets_a[i].groupby(cols)
data_sets_a[i] = gb.agg(funcs, inplace=1).reset_index()
If your list comprehension, you were returning the correct objects but not placing them where you wanted. 如果您的列表理解,您返回正确的对象,但不将它们放在您想要的位置。 The inplace=True
was not augmenting the same object being pointed to in the list data_sets_a
. inplace=True
并未扩充列表data_sets_a
指向的同一对象。
What I did was to assign to each element of the list the correct augmentation. 我所做的是为列表的每个元素分配正确的扩充。
Another way to have done it is to use what you already had: 另一种方法是使用你已经拥有的东西:
data_sets_a = [
d.groupby(
['col_a', 'col_b', 'col_c']
).agg(funcs, inplace=True).reset_index() for d in data_sets_a
]
just assign the new list to the old list. 只需将新列表分配给旧列表即可。
If I understand your question correctly, the problem is with your funcs
. 如果我没有理解你的问题,问题是你的funcs
。 You can try it this way instead: 您可以这样尝试:
def funcs(x):
col_d = x['col_d'].count()
col_e = x['col_e'].min()
col_f = x['col_f'].sum()
return pd.Series([col_d, col_e, col_f], index= ['col_d', 'col_e', 'col_f'] )
Then you can use apply(funcs)
然后你可以使用apply(funcs)
[d.groupby(['col_a', 'col_b', 'col_c']).apply(funcs).reset_index() for d in data_sets_a]
The output will be: 输出将是:
[ col_a col_b col_c col_d col_e col_f
0 1 1 1 2 1 3
1 2 2 2 1 3 3,
col_a col_b col_c col_d col_e col_f
0 4 4 4 1 4 4
1 5 5 5 2 5 11,
col_a col_b col_c col_d col_e col_f
0 7 7 7 3 7 24]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.