如何过滤列中包含特定字符串的两行之间的 dataframe 列？

Question

I am trying to understand how to select only those rows in my dataframe that are between two specific rows.我试图了解如何 select 只有我的 dataframe 中两个特定行之间的那些行。 These rows contain two specific strings in one of the columns.这些行在其中一列中包含两个特定的字符串。 I will explain further with this example.我将用这个例子进一步解释。

I have the following dataframe:我有以下 dataframe：

       String      Value
-------------------------
 0       Blue         45     
 1        Red         35   
 2      Green         75    
 3      Start         65   
 4     Orange         33   
 5     Purple         65   
 6       Teal         34
 7     Indigo         44
 8        End         32
 9     Yellow         22 
10        Red         14

There is only one instance of "Start" and only one instance of "End" in the "String" column. “String”列中只有一个“Start”实例和一个“End”实例。 I only want the rows of this dataframe that are between the rows that contain "Start" and "Stop" in the "String" column, and so I want to produce this output dataframe:我只想要这个 dataframe 中位于“字符串”列中包含“开始”和“停止”的行之间的行，所以我想生成这个 output dataframe：

       String      Value
-------------------------  
 3      Start         65   
 4     Orange         33   
 5     Purple         65   
 6       Teal         34
 7     Indigo         44
 8        End         32

Also, I want to preserve the order of those rows I am preserving, and so preserving the order of "Start", "Orange", "Purple", "Teal", "Indigo", "End".此外，我想保留我正在保留的那些行的顺序，因此保留“开始”、“橙色”、“紫色”、“蓝绿色”、“靛蓝”、“结束”的顺序。

I know I can index these specific columns by doing:我知道我可以通过以下方式索引这些特定的列：

index_start = df.index[df['String'] == 'Start']
index_end = df.index[df['String'] == 'End']

But I am not sure how to actually filter out all rows that are not between these two strings.但我不确定如何实际过滤掉不在这两个字符串之间的所有行。 How can I accomplish this in python?我如何在 python 中完成此操作？

Answer 1

This should be enough, iloc[] is useful when you try to locate rows by index, and it works the same as slices in lists.这应该足够了，当您尝试按索引定位行时，iloc[] 很有用，它的工作方式与列表中的切片相同。

index_start = df.index[df['String'] == 'Start']
index_end = df.index[df['String'] == 'End']  
df.iloc[index_start[0]:index_end[0]+1]

More information: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html更多信息： https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

Answer 2

If both values are present you temporarily set "String" as index:如果两个值都存在，则暂时将“String”设置为索引：

df.set_index('String').loc['Start':'End'].reset_index()

output: output：

   String  Value
0   Start     65
1  Orange     33
2  Purple     65
3    Teal     34
4  Indigo     44
5     End     32

Alternatively, using isin (then the order of Start/End doesn't matter):或者，使用isin （然后开始/结束的顺序无关紧要）：

m = df['String'].isin(['Start', 'End']).cumsum().eq(1)
df[m|m.shift()]

output: output：

   String  Value
3   Start     65
4  Orange     33
5  Purple     65
6    Teal     34
7  Indigo     44
8     End     32

Answer 3

You can build a boolean mask using eq + cummax and filter:您可以使用eq + cummax和过滤器构建一个 boolean 掩码：

out = df[df['String'].eq('Start').cummax() & df.loc[::-1, 'String'].eq('End').cummax()]

Output: Output：

   String  Value
3   Start     65
4  Orange     33
5  Purple     65
6    Teal     34
7  Indigo     44
8     End     32

Answer 4

As you return the index values through your work:当您通过工作返回索引值时：

df.iloc[index_start.item(): index_end.item()]

如何过滤列中包含特定字符串的两行之间的 dataframe 列？

问题描述

4 个解决方案

解决方案1
3 2022-04-21 18:53:55

解决方案2
3 已采纳 2022-04-21 18:57:37

解决方案3
2

解决方案4
2 2022-04-21 18:50:51

如何过滤列中包含特定字符串的两行之间的 dataframe 列？

问题描述

4 个解决方案

解决方案1 3 2022-04-21 18:53:55

解决方案2 3 已采纳 2022-04-21 18:57:37

解决方案3 2

解决方案4 2 2022-04-21 18:50:51

解决方案1
3 2022-04-21 18:53:55

解决方案2
3 已采纳 2022-04-21 18:57:37

解决方案3
2

解决方案4
2 2022-04-21 18:50:51