简体   繁体   中英

How to find shift between three consecutive rows in pandas dataframe?

I have a dataframe that looks like this:

df = pd.DataFrame({"event": ["Search Executed", "Search Results Returned", "Result List Clicked", "Result List Clicked", "Document Action", "Result List Clicked", "Returned Results", "Search Results Returned", "Result List Clicked", "Preview", "Open", "Search Executed", "Returned Results", "Document Action"]})
print(df)

                      event

0           Search Executed
1   Search Results Returned
2       Result List Clicked
3       Result List Clicked
4           Document Action
5       Result List Clicked
6          Returned Results
7   Search Results Returned
8       Result List Clicked
9                   Preview
10                     Open
11          Search Executed
12         Returned Results
13          Document Action

I am looking for whether these two patterns exists in the dataframe.

Pattern1:

event

Search Executed
Search Results Returned
Result List Clicked

Pattern 2:

event

Search Executed
Returned Results
Document Action

If either of these patterns exist, then I want to extract that part only. So in this case I want two outputs. Output 1:

                      event

0           Search Executed
1   Search Results Returned
2       Result List Clicked

Output 2:

                      event

11          Search Executed
12         Returned Results
13          Document Action

Is there an elegant way to do it?

You can convert the letters to their ASCII code using ord , then compute a diff to build a group (consecutive letters give a diff of 1 ). Finally use groupby to split to dataframes:

df = pd.DataFrame({"event": ["A", "B", "C", "C", "C", "E"]})
group = df['event'].map(ord).diff().ne(1).cumsum()

dfs = [g for _, g in df.groupby(group)]

output:

[  event
 0     A
 1     B
 2     C,

   event
 3     B
 4     C,

   event
 5     E]

If you want to group when the letters come back to previous ones (so that C->E remained grouped), change the group to use lt(0) :

group = df['event'].map(ord).diff().lt(0).cumsum()

output:

[  event
 0     A
 1     B
 2     C,

   event
 3     B
 4     C
 5     E]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM