I have a large dataframe df
for which I have a full list indices
of unique elements in df.index
. I now want to create a list of all the subdataframes indexed by elements in indices
; specifically
list_df = [df.loc[x] for x in indices]
Running this command is taking ages though ( df
has about 3e6
rows, and 3e3
unique indices). Is this a reasonable way to perform this operation? I would be very happy to receive any kind of comments or suggestions that could improve the performance of this and related problems.
Thanks in advance!
You can use list comprehension in groupby
object by index - level=0
, sort=False
change default sorting for faster solution:
L = [x for i, x in df.groupby(level=0, sort=False)]
np.random.seed(123)
N = 1000
L = list('abcdefghijklmno')
df = pd.DataFrame({'A': np.random.choice(L, N),
'B':np.random.randint(10, size=N)}, index=np.random.randint(100, size=N))
In [273]: %timeit [x for i, x in df.groupby(level=0, sort=False)]
100 loops, best of 3: 9.91 ms per loop
In [274]: %timeit [df.loc[x] for x in df.index]
1 loop, best of 3: 417 ms per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.