简体   繁体   English

将 Pandas 数据帧拆分为子数据帧列表的最快方法

[英]Fastest way to split a pandas dataframe into a list of subdataframes

I have a large dataframe df for which I have a full list indices of unique elements in df.index .我有一个大型数据框df ,我在df.index有一个完整的唯一元素列表indices I now want to create a list of all the subdataframes indexed by elements in indices ;我现在想创建所有的元素索引的subdataframes列表indices ; specifically具体来说

list_df = [df.loc[x] for x in indices]

Running this command is taking ages though ( df has about 3e6 rows, and 3e3 unique indices).运行此命令需要很3e6df大约有3e6行和3e3唯一索引)。 Is this a reasonable way to perform this operation?这是执行此操作的合理方法吗? I would be very happy to receive any kind of comments or suggestions that could improve the performance of this and related problems.我很乐意收到任何可以改善此问题和相关问题的性能的意见或建议。

Thanks in advance!提前致谢!

You can use list comprehension in groupby object by index - level=0 , sort=False change default sorting for faster solution:您可以通过索引在groupby对象中使用列表理解 - level=0 , sort=False更改默认排序以获得更快的解决方案:

L = [x for i, x in df.groupby(level=0, sort=False)]

np.random.seed(123)
N = 1000
L = list('abcdefghijklmno')
df = pd.DataFrame({'A': np.random.choice(L, N),
                   'B':np.random.randint(10, size=N)}, index=np.random.randint(100, size=N))

In [273]: %timeit [x for i, x in df.groupby(level=0, sort=False)]
100 loops, best of 3: 9.91 ms per loop

In [274]: %timeit [df.loc[x] for x in df.index]
1 loop, best of 3: 417 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM