[英]Remove empty dataframe from pandas.core.groupby.generic.DataFrameGroupBy
How can delete empty dataframes from pandas.core.groupby.generic.DataFrameGroupBy?如何从 pandas.core.groupby.generic.DataFrameGroupBy 中删除空数据帧?
my aggregation code:我的聚合代码:
cols = ["col1", "col2","col3","col4"]
joined = pd.concat(df.reset_index() for df in collectData)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)
df = joined.set_index('sensor').groupby(pd.Grouper(freq='D'))
data after grouping:分组后的数据:
list(df)
[(Timestamp('2020-02-04 00:00:00+0000', tz='UTC', freq='D'),
col1 col2 col3 col4
sensor
2020-02-04 00:00:00+00:00 2.586569 0.015321 0.000149 0.884470
2020-02-04 00:00:00+00:00 4.429571 4.049798 1.820845 2.882445
2020-02-04 00:00:00+00:00 12.883314 6.900607 1.002138 3.613021
... ... ... ... ...
2020-02-04 23:45:00+00:00 3.798017 1.605979 0.176515 2.400820
2020-02-04 23:45:00+00:00 5.546771 2.232437 0.233292 3.750547
2020-02-04 23:45:00+00:00 4.910360 3.730932 0.985459 1.238469
[48945 rows x 4 columns]),
(Timestamp('2020-02-05 00:00:00+0000', tz='UTC', freq='D'),
Empty DataFrame
Columns: [col1, col2, col3, col4]
Index: []),
(Timestamp('2020-02-06 00:00:00+0000', tz='UTC', freq='D'),
Empty DataFrame
Columns: [col1, col2, col3, col4]]
Index: []),
(Timestamp('2020-02-07 00:00:00+0000', tz='UTC', freq='D'),
col1 col2 col3 col4
sensor
2020-02-07 00:00:00+00:00 17.065174 3.065422 0.171053 9.048574
2020-02-07 00:00:00+00:00 30.181997 20.651204 4.413567 15.200674
2020-02-07 00:00:00+00:00 1.864378 1.726365 0.819459 1.441588
... ... ... ... ...
2020-02-07 23:45:00+00:00 39.644320 0.234830 0.002289 13.642480
2020-02-07 23:45:00+00:00 30.778517 10.540318 0.944788 13.165241
2020-02-07 23:45:00+00:00 34.610439 25.342142 6.184292 22.725937
[50112 rows x 4 columns]),]
size of df df.size()
: df
df.size()
的大小:
sensor
2020-02-02 00:00:00+00:00 47574
2020-02-03 00:00:00+00:00 49353
2020-02-04 00:00:00+00:00 48945
2020-02-05 00:00:00+00:00 0
2020-02-06 00:00:00+00:00 0
...
2020-09-26 00:00:00+00:00 83680
2020-09-27 00:00:00+00:00 84293
2020-09-28 00:00:00+00:00 84873
2020-09-29 00:00:00+00:00 84306
2020-09-30 00:00:00+00:00 84875
Freq: D, Length: 242, dtype: int64
I need to remove the empty dataframes before applying std = df.apply(gstd)
.在应用
std = df.apply(gstd)
之前,我需要删除空数据框。 I don't know the location of empty dataframe.不知道空dataframe的位置。 https://stackoverflow.com/a/51052536/14338086 and https://stackoverflow.com/a/16916611/14338086 return error.
https://stackoverflow.com/a/51052536/14338086和https://stackoverflow.com/a/16916611/14338086返回错误。 Also using
df.filter(lambda x: x.size() != 0)
returns TypeError: 'numpy.int64' object is not callable
.同样使用
df.filter(lambda x: x.size() != 0)
返回TypeError: 'numpy.int64' object is not callable
。 dropna()
is not available. dropna()
不可用。
I solved the question by the following code, maybe it helps someone.我通过以下代码解决了这个问题,也许它可以帮助某人。
cols = [" col1", "col2", "col3", "col4"]
joined = pd.concat(df.reset_index() for df in collectData)
joined = joined.replace({np.nan:1, 0:1})
joined[cols] = joined[cols].mask(joined[cols] < 0, 1)
df = joined.set_index('sensor').groupby(pd.Grouper(freq='D'))
dff = pd.concat(map(lambda x: x[1], df))
means = dff.groupby(dff.index.floor('d')).agg(gmean)
std = dff.groupby(dff.index.floor('d')).agg(gstd)
df_result = pd.merge (left=means, right=std, how='left', on='sensor')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.