I need to split a dataframe into 3 unique dataframes based on a header-row reoccuring in the dataframe.
My dataframe looks like:
0 1 2 .... 14
0 Alert Type Response Cost
1 w1 x1 y1 z1
2 w2 x2 y2 z3
. . . . .
. . . . .
144 Alert Type Response Cost
145 a1 b1 c1 d1
146 a2 b2 c2 d2
I was trying to get the index numbers containing the word "Alert" with loc to slice the dataframe into the sub dataframes.
indexes = df.index[df.loc[df[0] == "Alert"]].tolist()
But this returns:
IndexError: arrays used as indices must be of integer (or boolean) type
Any hint on that error or is there even a way I don't see (eg smth like group by?)
Thanks for your help.
np.split
dfs = np.split(df, np.flatnonzero(df[0] == 'Alert')[1:])
Find where df[0]
is equal to 'Alert'
np.flatnonzero(df[0] == 'Alert')
Ignore the first one because we don't need an empty list element
np.flatnonzero(df[0] == 'Alert')[1:]
Use np.split
to get the list
np.split(df, np.flatnonzero(df[0] == 'Alert')[1:])
print(*dfs, sep='\n\n')
0 1 2 14
0 Alert Type Response Cost
1 w1 x1 y1 z1
2 w2 x2 y2 z3
0 1 2 14
144 Alert Type Response Cost
145 a1 b1 c1 d1
146 a2 b2 c2 d2
@piRSquared answer works great, so let me just explain you error.
This is how you can get the indexes where the first element is Alert
:
indexes = list(df.loc[df['0'] == "Alert"].index)
Your error arises from the fact that df.index
is a pandas.RangeIndex object, so it cannot be further indexed.
Then you can split your dataframe using a list comprehension like this:
listdf = [df.iloc[i:j] for i, j in zip(indexes, indexes[1:] + [len(df)])]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.