[英]Get the rows of dataframe based on the consecutive values of one column
Are there way to get consecutive rows according to value of specific column?有没有办法根据特定列的值获取连续行? For example:
例如:
column1![]() |
column2![]() |
View![]() |
|
---|---|---|---|
row1![]() |
1 ![]() |
2 ![]() |
c ![]() |
row2![]() |
3 ![]() |
4 ![]() |
a![]() |
row3![]() |
5 ![]() |
6 ![]() |
p![]() |
row4![]() |
7 ![]() |
8 ![]() |
p![]() |
row5![]() |
9 ![]() |
10 ![]() |
n ![]() |
I need to get the rows that have the letter of word 'app' as View, so in this example I need to save row2, row3 and row4 in a list.我需要获取包含单词“app”字母的行作为视图,因此在本示例中,我需要将row2、row3 和 row4保存在列表中。
Here is a generalizable approach.这是一个通用的方法。 I use
index_slice_by_substring()
to generate a tuple of integers representing the beginning and ending row.我使用
index_slice_by_substring()
来生成代表开始和结束行的整数元组。 The function rows_by_consecutive_letters()
takes your dataframe, the column name to check, and the string you want to look for, and for the return value it utilizes .iloc
to grab a slice of the table by integer values.函数
rows_by_consecutive_letters()
获取您的数据rows_by_consecutive_letters()
、要检查的列名以及您要查找的字符串,对于返回值,它利用.iloc
按整数值抓取表的一部分。
The key to getting the slice indices is joining the "View" column values together into a single string using ''.join(df[column])
and checking substrings of the same length as the condition string from left to right until there's a match获取切片索引的关键是使用
''.join(df[column])
将“View”列值连接到一个字符串中,并从左到右检查与条件字符串长度相同的子字符串,直到匹配为止
def index_slice_by_substring(full_string, substring) -> tuple:
len_substring = len(substring)
len_full_string = len(full_string)
for x0, x1 in enumerate(range(len_substring,len_full_string)):
if full_string[x0:x1] == substring:
return (x0,x1)
def rows_by_consecutive_letters(df, column, condition) -> pd.DataFrame:
row_begin, row_end = index_slice_by_substring(''.join(df[column]), condition)
return df.iloc[row_begin:row_end,:]
print(rows_by_consecutive_letters(your_df,"View","app"))
Returns:返回:
column1 column2 View
1 3 4 a
2 5 6 p
3 7 8 p
You can use str.find
but this only finds the first occurrence of your search term.您可以使用
str.find
但这只会找到您的搜索词的第一次出现。
search = 'app'
i = ''.join(df.View).find(search)
if i>-1:
print(df.iloc[i: i+len(search)])
Output输出
column1 column2 View
row2 3 4 a
row3 5 6 p
row4 7 8 p
To find none (without error checking), one and all occurrences you can use re.finditer
.要查找无(没有错误检查),您可以使用
re.finditer
一次和所有出现。 The result is a list of dataframe slices.结果是数据帧切片列表。
import re
search='p' # searched for 'p' to find more than one
[df.iloc[x.start():x.end()] for x in re.finditer(search, ''.join(df.View))]
Output输出
[ column1 column2 View
row3 5 6 p,
column1 column2 View
row4 7 8 p]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.