I have a dataframe that looks like this:
df = pd.DataFrame({"event": ["Search Executed", "Search Results Returned", "Result List Clicked", "Result List Clicked", "Document Action", "Result List Clicked", "Returned Results", "Search Results Returned", "Result List Clicked", "Preview", "Open", "Search Executed", "Returned Results", "Document Action"]})
print(df)
event
0 Search Executed
1 Search Results Returned
2 Result List Clicked
3 Result List Clicked
4 Document Action
5 Result List Clicked
6 Returned Results
7 Search Results Returned
8 Result List Clicked
9 Preview
10 Open
11 Search Executed
12 Returned Results
13 Document Action
I am looking for whether these two patterns exists in the dataframe.
Pattern1:
event
Search Executed
Search Results Returned
Result List Clicked
Pattern 2:
event
Search Executed
Returned Results
Document Action
If either of these patterns exist, then I want to extract that part only. So in this case I want two outputs. Output 1:
event
0 Search Executed
1 Search Results Returned
2 Result List Clicked
Output 2:
event
11 Search Executed
12 Returned Results
13 Document Action
Is there an elegant way to do it?
You can convert the letters to their ASCII code using ord
, then compute a diff
to build a group (consecutive letters give a diff of 1
). Finally use groupby
to split to dataframes:
df = pd.DataFrame({"event": ["A", "B", "C", "C", "C", "E"]})
group = df['event'].map(ord).diff().ne(1).cumsum()
dfs = [g for _, g in df.groupby(group)]
output:
[ event
0 A
1 B
2 C,
event
3 B
4 C,
event
5 E]
If you want to group when the letters come back to previous ones (so that C->E remained grouped), change the group
to use lt(0)
:
group = df['event'].map(ord).diff().lt(0).cumsum()
output:
[ event
0 A
1 B
2 C,
event
3 B
4 C
5 E]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.