简体   繁体   English

在 Pandas dataframe 中,如何根据满足不同条件的起始行和结束行过滤一组行?

[英]In a Pandas dataframe, how to filter a set of rows based on a start row and end row both satisfying different conditions?

In a Pandas dataframe, how to filter a set of rows based on a start row and end row both satisfying different conditions?在 Pandas dataframe 中,如何根据满足不同条件的起始行和结束行过滤一组行?

if one of my string columns contain a particular substring, that row is a start row.如果我的其中一个字符串列包含特定的 substring,则该行是起始行。 Then, if there is another row where my string column contains another substring, that row is an end row.然后,如果有另一行我的字符串列包含另一个 substring,则该行是结束行。 I need a way to just filter all rows between these two.我需要一种方法来过滤这两者之间的所有行。

I tried to find the start_row using,我试图找到 start_row 使用,

start_row = df_page['StringCol'].str.contains('SubStrForStartRow')

This gives me a boolean series that has 'True' for my start row.这给了我一个 boolean 系列,它的起始行为“True”。 But, not sure how to further achieve what I described above.但是,不知道如何进一步实现我上面描述的。

For example, Consider a dataframe as follows例如,考虑如下 dataframe

data = [['UnwantedRow', ''],['TransactionStart', ''],['Date1', 200],['Date2', 300],['TransactionEnd', ''],['UnwantedRow','']]
df = pandas.DataFrame(data, columns=['Transaction', 'Value'])

Using 'Start' and 'Stop' substrings, I want to be able to filter out all rows between the 'TransactionStart' row and the 'TransactionEnd' row.使用“开始”和“停止”子字符串,我希望能够过滤掉“TransactionStart”行和“TransactionEnd”行之间的所有行。 That is, the two rows which contain ['Date1', 200] and ['Date2', 300] alone.也就是说,仅包含 ['Date1', 200] 和 ['Date2', 300] 的两行。

Return the index number of the start and end rows with .index[0] and filter for those rows with iloc .使用.index[0]返回开始行和结束行的索引号,并使用iloc过滤这些行。 The upperbound of iloc is exclusive, which is why I use end_row+1 : iloc 的上限是独占的,这就是我使用end_row+1的原因:

data = [['UnwantedRow', ''],['TransactionStart', ''],['Date1', 200],['Date2', 300],['TransactionEnd', ''],['UnwantedRow','']]
df = pd.DataFrame(data, columns=['Transaction', 'Value'])
start_row = df[df['Transaction'].str.contains('TransactionStart')].index[0]
end_row =  df[df['Transaction'].str.contains('TransactionEnd')].index[0]
df = df.iloc[start_row:end_row+1]
df
Out[1]: 
        Transaction Value
1  TransactionStart      
2             Date1   200
3             Date2   300
4    TransactionEnd      

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在不读取 rest 的情况下搜索满足一组条件的第一行的 pandas DataFrame? - How to search a pandas DataFrame for the first row satisfying set of conditions without reading the rest of the rows? 根据行和列条件保留 pandas dataframe 的行 - Keep rows of a pandas dataframe based on both row and column conditions 如何使用 groupby 在满足多个条件的 pandas dataframe 中保持最佳行 - How to keep the best row in a pandas dataframe satisfying multiple conditions with groupby 如何根据熊猫数据框中的相似行设置行的值? - How to set the values of a row based on similar rows in pandas dataframe? 如何根据多个条件根据前一行填充 pandas dataframe 列的行? - How to populate rows of pandas dataframe column based with previous row based on a multiple conditions? 如果同一行存在于另一个数据框中,但以两个df中的所有列结尾,则如何从Pandas数据框中删除行 - How to remove rows from Pandas dataframe if the same row exists in another dataframe but end up with all columns from both df 根据条件在 Pandas DataFrame 中创建新行 - Create new row in Pandas DataFrame based on conditions Pandas即使在不同的行上,如何创建具有开始和结束的新数据帧 - Pandas How to create a new dataframe with a start and end even if on different rows 熊猫根据行,列和日期过滤DataFrame - Pandas filter DataFrame based on row , column and date 根据行相关和列相关条件设置数据框值 - Set dataframe values based on both row-dependent and column-dependent conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM