将 Pandas 数据帧拆分为子数据帧列表的最快方法

Question

I have a large dataframe df for which I have a full list indices of unique elements in df.index .我有一个大型数据框df ，我在df.index有一个完整的唯一元素列表indices 。 I now want to create a list of all the subdataframes indexed by elements in indices ;我现在想创建所有的元素索引的subdataframes列表indices ; specifically具体来说

list_df = [df.loc[x] for x in indices]

Running this command is taking ages though ( df has about 3e6 rows, and 3e3 unique indices).运行此命令需要很3e6 （ df大约有3e6行和3e3唯一索引）。 Is this a reasonable way to perform this operation?这是执行此操作的合理方法吗？ I would be very happy to receive any kind of comments or suggestions that could improve the performance of this and related problems.我很乐意收到任何可以改善此问题和相关问题的性能的意见或建议。

Thanks in advance!提前致谢！

Answer 1

You can use list comprehension in groupby object by index - level=0 , sort=False change default sorting for faster solution:您可以通过索引在groupby对象中使用列表理解 - level=0 , sort=False更改默认排序以获得更快的解决方案：

L = [x for i, x in df.groupby(level=0, sort=False)]

np.random.seed(123)
N = 1000
L = list('abcdefghijklmno')
df = pd.DataFrame({'A': np.random.choice(L, N),
                   'B':np.random.randint(10, size=N)}, index=np.random.randint(100, size=N))

In [273]: %timeit [x for i, x in df.groupby(level=0, sort=False)]
100 loops, best of 3: 9.91 ms per loop

In [274]: %timeit [df.loc[x] for x in df.index]
1 loop, best of 3: 417 ms per loop

将 Pandas 数据帧拆分为子数据帧列表的最快方法

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-10-10 13:27:49

将 Pandas 数据帧拆分为子数据帧列表的最快方法

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-10-10 13:27:49

解决方案1
4 已采纳 2017-10-10 13:27:49