简体   繁体   English

如何根据标题行将数据帧拆分为多个数据帧

[英]How to split dataframe into multiple dataframes based on header rows

I need to split a dataframe into 3 unique dataframes based on a header-row reoccuring in the dataframe. 我需要根据数据帧中重新出现的标头行将数据帧分为3个唯一的数据帧。

My dataframe looks like: 我的数据框看起来像:

        0         1             2     ....   14
0   Alert     Type      Response           Cost
1     w1        x1            y1            z1
2     w2        x2            y2            z3
.      .         .             .             .
.      .         .             .             .
144 Alert     Type      Response           Cost
145   a1        b1            c1             d1
146   a2        b2            c2             d2

I was trying to get the index numbers containing the word "Alert" with loc to slice the dataframe into the sub dataframes. 我试图获取包含单词“ Alert”的索引编号,并将loc切片为子数据帧。

indexes = df.index[df.loc[df[0] == "Alert"]].tolist()

But this returns: 但这返回:

IndexError: arrays used as indices must be of integer (or boolean) type

Any hint on that error or is there even a way I don't see (eg smth like group by?) 关于该错误的任何提示,或者甚至还有我看不到的方法(例如,像group by这样的东西?)

Thanks for your help. 谢谢你的帮助。

np.split

dfs = np.split(df, np.flatnonzero(df[0] == 'Alert')[1:])

Explanation 说明

  • Find where df[0] is equal to 'Alert' 查找df[0]等于'Alert'

     np.flatnonzero(df[0] == 'Alert') 
  • Ignore the first one because we don't need an empty list element 忽略第一个,因为我们不需要一个空列表元素

     np.flatnonzero(df[0] == 'Alert')[1:] 
  • Use np.split to get the list 使用np.split获取列表

     np.split(df, np.flatnonzero(df[0] == 'Alert')[1:]) 

show results 显示结果

print(*dfs, sep='\n\n')

      0     1         2     14
0  Alert  Type  Response  Cost
1     w1    x1        y1    z1
2     w2    x2        y2    z3

        0     1         2     14
144  Alert  Type  Response  Cost
145     a1    b1        c1    d1
146     a2    b2        c2    d2

@piRSquared answer works great, so let me just explain you error. @piRSquared答案的效果很好,所以让我向您解释错误。

This is how you can get the indexes where the first element is Alert : 这是获取第一个元素为Alert的索引的方法:

indexes = list(df.loc[df['0'] == "Alert"].index)

Your error arises from the fact that df.index is a pandas.RangeIndex object, so it cannot be further indexed. 您的错误是由于df.indexpandas.RangeIndex对象,因此无法进一步建立索引而引起的。

Then you can split your dataframe using a list comprehension like this: 然后,您可以使用列表理解来拆分数据框,如下所示:

listdf = [df.iloc[i:j] for i, j in zip(indexes, indexes[1:] + [len(df)])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据值将一个数据帧拆分为多个具有相同列标题的数据帧 - Split one dataframe into multiple dataframes with same column header based on values 根据空行将Pandas数据框拆分为多个较小的数据框 - Split a Pandas Dataframe into multiple smaller dataframes based on empty rows 如何将 DataFrame 拆分为多个行数更少的数据帧? - How can I split a DataFrame into multiple DataFrames of fewer rows? 如何将具有多种类型信息的 dataframe 拆分为基于字符串的单独数据帧? - How to split dataframe with multiple types of information into separate dataframes based on string? 如何根据 MultiIndex 的一部分将 DataFrame 拆分为多个 DataFrame? - How to split a DataFrame into multiple DataFrames based off part of a MultiIndex? 根据条件将 pyspark dataframe 拆分为多个数据帧 - split pyspark dataframe into multiple dataframes based on a condition 熊猫-根据日期将数据框拆分为多个数据框? - Pandas - Split dataframe into multiple dataframes based on dates? 将 dataframe 拆分为多个数据帧 - Split dataframe into multiple dataframes 如何将 dataframe 线拆分为多个数据帧? - How to split a dataframe line into a multiple dataframes? 如何遍历 dataframe 行,将数据拆分为基于列的单独数据帧? - How to iterate through dataframe rows, split data to separate dataframes based on column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM