I have like 40.000 groups after the code:
groups=data.groupby('A')
I need to subdived them like in sub-groups of 10.000, of course without overlapping and keeping the groupby stucture. Like group1=groups[0:10000], group2=groups[10000:20000]... to re-use them in other scripts. How can I do that?
Thank you !
in that case you can simply slice using iloc
group1=groups.iloc[0:10000,:]
group2=groups.iloc[10000:20000,:]
.
group3=groups.iloc[30000:40000,:]
this is when you want to slice according to indexes or number of rows required.
id you want to do it category wise then after performing group b you can simply do this
groups=groups.groupby(a).agg()
group1=groups.loc['category 1']
code mentioned in question aggregate not mentioned which is not valid refer the link to know how groupby works groupby
Unless you're aggregating right afterwards, groupby might be an overkill for this task.
data = data.set_index('A')
group_idx = data.index.drop_duplicates()
sub_group_1 = data.loc[group_idx[:10000]]
will get you first 10000 groups
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.