[英]Fastest way to split a pandas dataframe into a list of subdataframes
I have a large dataframe df
for which I have a full list indices
of unique elements in df.index
.我有一个大型数据框
df
,我在df.index
有一个完整的唯一元素列表indices
。 I now want to create a list of all the subdataframes indexed by elements in indices
;我现在想创建所有的元素索引的subdataframes列表
indices
; specifically具体来说
list_df = [df.loc[x] for x in indices]
Running this command is taking ages though ( df
has about 3e6
rows, and 3e3
unique indices).运行此命令需要很
3e6
( df
大约有3e6
行和3e3
唯一索引)。 Is this a reasonable way to perform this operation?这是执行此操作的合理方法吗? I would be very happy to receive any kind of comments or suggestions that could improve the performance of this and related problems.
我很乐意收到任何可以改善此问题和相关问题的性能的意见或建议。
Thanks in advance!提前致谢!
You can use list comprehension in groupby
object by index - level=0
, sort=False
change default sorting for faster solution:您可以通过索引在
groupby
对象中使用列表理解 - level=0
, sort=False
更改默认排序以获得更快的解决方案:
L = [x for i, x in df.groupby(level=0, sort=False)]
np.random.seed(123)
N = 1000
L = list('abcdefghijklmno')
df = pd.DataFrame({'A': np.random.choice(L, N),
'B':np.random.randint(10, size=N)}, index=np.random.randint(100, size=N))
In [273]: %timeit [x for i, x in df.groupby(level=0, sort=False)]
100 loops, best of 3: 9.91 ms per loop
In [274]: %timeit [df.loc[x] for x in df.index]
1 loop, best of 3: 417 ms per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.