简体   繁体   English

如何过滤列中包含特定字符串的两行之间的 dataframe 列?

[英]How to filter dataframe columns between two rows that contain specific string in column?

I am trying to understand how to select only those rows in my dataframe that are between two specific rows.我试图了解如何 select 只有我的 dataframe 中两个特定行之间的那些行。 These rows contain two specific strings in one of the columns.这些行在其中一列中包含两个特定的字符串。 I will explain further with this example.我将用这个例子进一步解释。

I have the following dataframe:我有以下 dataframe:

       String      Value
-------------------------
 0       Blue         45     
 1        Red         35   
 2      Green         75    
 3      Start         65   
 4     Orange         33   
 5     Purple         65   
 6       Teal         34
 7     Indigo         44
 8        End         32
 9     Yellow         22 
10        Red         14

There is only one instance of "Start" and only one instance of "End" in the "String" column. “String”列中只有一个“Start”实例和一个“End”实例。 I only want the rows of this dataframe that are between the rows that contain "Start" and "Stop" in the "String" column, and so I want to produce this output dataframe:我只想要这个 dataframe 中位于“字符串”列中包含“开始”和“停止”的行之间的行,所以我想生成这个 output dataframe:

       String      Value
-------------------------  
 3      Start         65   
 4     Orange         33   
 5     Purple         65   
 6       Teal         34
 7     Indigo         44
 8        End         32

Also, I want to preserve the order of those rows I am preserving, and so preserving the order of "Start", "Orange", "Purple", "Teal", "Indigo", "End".此外,我想保留我正在保留的那些行的顺序,因此保留“开始”、“橙色”、“紫色”、“蓝绿色”、“靛蓝”、“结束”的顺序。

I know I can index these specific columns by doing:我知道我可以通过以下方式索引这些特定的列:

index_start = df.index[df['String'] == 'Start']
index_end = df.index[df['String'] == 'End']    

But I am not sure how to actually filter out all rows that are not between these two strings.但我不确定如何实际过滤掉不在这两个字符串之间的所有行。 How can I accomplish this in python?我如何在 python 中完成此操作?

This should be enough, iloc[] is useful when you try to locate rows by index, and it works the same as slices in lists.这应该足够了,当您尝试按索引定位行时,iloc[] 很有用,它的工作方式与列表中的切片相同。

index_start = df.index[df['String'] == 'Start']
index_end = df.index[df['String'] == 'End']  
df.iloc[index_start[0]:index_end[0]+1]

More information: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html更多信息: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

If both values are present you temporarily set "String" as index:如果两个值都存在,则暂时将“String”设置为索引:

df.set_index('String').loc['Start':'End'].reset_index()

output: output:

   String  Value
0   Start     65
1  Orange     33
2  Purple     65
3    Teal     34
4  Indigo     44
5     End     32

Alternatively, using isin (then the order of Start/End doesn't matter):或者,使用isin (然后开始/结束的顺序无关紧要):

m = df['String'].isin(['Start', 'End']).cumsum().eq(1)
df[m|m.shift()]

output: output:

   String  Value
3   Start     65
4  Orange     33
5  Purple     65
6    Teal     34
7  Indigo     44
8     End     32

You can build a boolean mask using eq + cummax and filter:您可以使用eq + cummax和过滤器构建一个 boolean 掩码:

out = df[df['String'].eq('Start').cummax() & df.loc[::-1, 'String'].eq('End').cummax()]

Output: Output:

   String  Value
3   Start     65
4  Orange     33
5  Purple     65
6    Teal     34
7  Indigo     44
8     End     32

As you return the index values through your work:当您通过工作返回索引值时:

df.iloc[index_start.item(): index_end.item()]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 过滤 dataframe 中包含某个特定字符串的行 - filter rows in a dataframe that contain a certain specific string 如何为 dataframe 中列中具有特定字符串的两行之间的行分配值? - How to assign value to rows which are between two rows with specific string in column in dataframe? 将函数应用于列标题包含特定字符串的数据框中的列 - Applying a function to columns in a dataframe whose column headings contain a specific string 如何在 pandas dataframe 中通过在两行之间划分特定列中的值并保持其他列不变来创建新行? - How to create a new row in pandas dataframe by dividing values in a specific column between two rows and keeping other columns intact? 基于字符串包含的 0/1 矩阵过滤数据帧行 - Filter dataframe rows based on 0/1 matrix with string contain Python:过滤数据框,以便仅保留在特定列中包含特定文本的行 - Python: filter dataframe so only rows that contain specific text in specific column remains 如何将 dataframe 字符串列拆分为两列? - How to split a dataframe string column into two columns? 如何在一个具有相同值(字符串)的数据框中找到两个连续的行,并在它们之间添加更多行? - how to find two consecutive rows in a dataframe with same value(string) for a column and add more rows between them? Pyspark 过滤 dataframe 如果列不包含字符串 - Pyspark filter dataframe if column does not contain string 如何从其他两列之间的列中获取 dataframe 的行? - How to get the rows of a dataframe from columns between two other columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM