[英]How to insert rows in dataframe based on specific condition?
I have a following dataframe:
Index![]() |
Time![]() |
User![]() |
Description![]() |
---|---|---|---|
1 ![]() |
27.10.2021 15:58:00 ![]() |
UserA@gmail.com![]() |
Tab Alpha of type PARTSTUDIO opened by User A![]() |
2 ![]() |
27.10.2021 15:59:00 ![]() |
UserA@gmail.com![]() |
Start edit of part studio feature![]() |
3 ![]() |
27.10.2021 15:59:00 ![]() |
UserA@gmail.com![]() |
Cancel Operation![]() |
4 ![]() |
27.10.2021 15:59:00 ![]() |
UserB@gmail.com![]() |
Tab Alpha of type PARTSTUDIO opened by User B![]() |
5 ![]() |
27.10.2021 15:59:00 ![]() |
UserB@gmail.com![]() |
Start edit of part studio feature![]() |
6 ![]() |
27.10.2021 16:03:00 ![]() |
UserB@gmail.com![]() |
Cancel Operation![]() |
7 ![]() |
27.10.2021 16:03:00 ![]() |
UserA@gmail.com![]() |
Add assembly feature![]() |
9 ![]() |
27.10.2021 16:03:00 ![]() |
UserA@gmail.com![]() |
Tab Beta of type PARTSTUDIO opened by User A![]() |
10 ![]() |
27.10.2021 16:15:00 ![]() |
UserA@gmail.com![]() |
Start edit of part studio feature![]() |
11 ![]() |
27.10.2021 16:15:00 ![]() |
UserB@gmail.com![]() |
Start edit of part studio feature![]() |
12 ![]() |
27.10.2021 16:15:00 ![]() |
UserB@gmail.com![]() |
Tab Alpha of type PARTSTUDIO closed by User B![]() |
14 ![]() |
27.10.2021 16:54:00 ![]() |
UserB@gmail.com![]() |
Add assembly feature![]() |
15 ![]() |
27.10.2021 16:55:00 ![]() |
UserA@gmail.com![]() |
Tab Beta of type PARTSTUDIO closed by User A![]() |
16 ![]() |
27.10.2021 16:55:00 ![]() |
UserB@gmail.com![]() |
Start edit of part studio feature![]() |
17 ![]() |
27.10.2021 16:55:00 ![]() |
UserB@gmail.com![]() |
Tab Delta of type PARTSTUDIO closed by User B![]() |
Expected output:预期输出:
Index![]() |
Time![]() |
User![]() |
Description![]() |
---|---|---|---|
1 ![]() |
27.10.2021 15:58:00 ![]() |
UserA@gmail.com![]() |
Tab Alpha of type PARTSTUDIO opened by User A![]() |
2 ![]() |
27.10.2021 15:59:00 ![]() |
UserA@gmail.com![]() |
Start edit of part studio feature![]() |
3 ![]() |
27.10.2021 15:59:00 ![]() |
UserA@gmail.com![]() |
Cancel Operation![]() |
4 ![]() |
27.10.2021 15:59:00 ![]() |
UserB@gmail.com![]() |
Tab Alpha of type PARTSTUDIO opened by User B![]() |
5 ![]() |
27.10.2021 15:59:00 ![]() |
UserB@gmail.com![]() |
Start edit of part studio feature![]() |
6 ![]() |
27.10.2021 16:03:00 ![]() |
UserB@gmail.com![]() |
Cancel Operation![]() |
7 ![]() |
27.10.2021 16:03:00 ![]() |
UserA@gmail.com![]() |
Add assembly feature![]() |
8 ![]() |
27.10.2021 16:03:00 ![]() |
UserA@gmail.com![]() |
Tab Alpha of type PARTSTUDIO closed by User A![]() |
9 ![]() |
27.10.2021 16:03:00 ![]() |
UserA@gmail.com![]() |
Tab Beta of type PARTSTUDIO opened by User A![]() |
10 ![]() |
27.10.2021 16:15:00 ![]() |
UserA@gmail.com![]() |
Start edit of part studio feature![]() |
11 ![]() |
27.10.2021 16:15:00 ![]() |
UserB@gmail.com![]() |
Start edit of part studio feature![]() |
12 ![]() |
27.10.2021 16:15:00 ![]() |
UserB@gmail.com![]() |
Tab Alpha of type PARTSTUDIO closed by User B![]() |
13 ![]() |
27.10.2021 16:15:00 ![]() |
UserB@gmail.com![]() |
Tab Delta of type PARTSTUDIO opened by User B![]() |
14 ![]() |
27.10.2021 16:54:00 ![]() |
UserB@gmail.com![]() |
Add assembly feature![]() |
15 ![]() |
27.10.2021 16:55:00 ![]() |
UserA@gmail.com![]() |
Tab Beta of type PARTSTUDIO closed by User A![]() |
16 ![]() |
27.10.2021 16:55:00 ![]() |
UserB@gmail.com![]() |
Start edit of part studio feature![]() |
17 ![]() |
27.10.2021 16:55:00 ![]() |
UserB@gmail.com![]() |
Tab Delta of type PARTSTUDIO closed by User B![]() |
How to iterate through dataframe and check if after each value "Tab x opened by User y" in the Description column, the "Tab x closed by User y" follows somewhere further in the dataframe?如何遍历数据框并检查在描述列中的每个值“用户 y打开的选项卡 x”之后,“用户 y关闭的选项卡 x”是否在数据框中的某个位置之后? If yes OK.
如果是的话。 If not, if the "Tab zz opened by User A" follows, that means that "Tab x closed by User y" is missing and should be inserted a row before the "Tab zz opened by User A" value (example index 8).
如果不是,如果“用户 A打开的选项卡 zz”跟在后面,则意味着“用户 y关闭的选项卡 x”缺失,应该在“用户 A打开的选项卡 zz”值之前插入一行(例如索引 8) . Same goes vice versa (index 13).
反之亦然(索引 13)。 Is there a way to do this without df.iterrows?
没有 df.iterrows 有没有办法做到这一点? Thanks in advance.
提前致谢。
Sorry, I forgot to answer this.对不起,我忘了回答这个问题。
Here is one solution.这是一种解决方案。 Not really concise and particularly elegant, but should be faster than using
iterrows
for both modifying and checking future rows.不是很简洁也不是特别优雅,但应该比使用
iterrows
更快地修改和检查未来的行。
Time User Description
0 27.10.2021 15:58:00 UserA@gmail.com Tab Alpha of type PARTSTUDIO opened by User A
1 27.10.2021 15:59:00 UserA@gmail.com Start edit of part studio feature
2 27.10.2021 15:59:00 UserA@gmail.com Cancel Operation
3 27.10.2021 15:59:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
4 27.10.2021 15:59:00 UserB@gmail.com Start edit of part studio feature
5 27.10.2021 16:03:00 UserB@gmail.com Cancel Operation
6 27.10.2021 16:03:00 UserA@gmail.com Add assembly feature
7 27.10.2021 16:03:00 UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A
8 27.10.2021 16:03:00 UserA@gmail.com Tab Gamma of type PARTSTUDIO opened by User A
9 27.10.2021 16:14:00 UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A
10 27.10.2021 16:15:00 UserA@gmail.com Start edit of part studio feature
11 27.10.2021 16:15:00 UserB@gmail.com Start edit of part studio feature
12 27.10.2021 16:15:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
13 27.10.2021 16:54:00 UserB@gmail.com Add assembly feature
14 27.10.2021 16:55:00 UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A
15 27.10.2021 16:55:00 UserB@gmail.com Start edit of part studio feature
16 27.10.2021 16:55:00 UserB@gmail.com Tab Delta of type PARTSTUDIO closed by User B
17 27.10.2021 16:56:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
18 27.10.2021 16:57:00 UserB@gmail.com Tab Beta of type PARTSTUDIO closed by User B
I did add a couple of more open/close in a row for some more testing.我确实连续添加了几个打开/关闭以进行更多测试。
# Pattern to extract action info.
pattern = r'^Tab (?P<tab_name>.+) of type (?P<tab_type>.+) (?P<tab_action>\bclosed\b|\bopened\b) by (?P<user_id>.+)$'
# Add utility columns.
df = pd.concat([df, df['Description'].str.extract(pattern)], axis=1)
# Get rows with tweaked index.
def get_new_rows(df):
all_values = []
for action in ['opened', 'closed']:
action_mask = df['tab_action'].eq(action)
first_tabs = df[df['tab_action'].eq(df['tab_action'].shift(-1)) & action_mask]
second_tabs = df[df['tab_action'].eq(df['tab_action'].shift(1)) & action_mask]
if len(first_tabs) == 0:
continue
if action == 'opened':
values_tab, index_tab, offset, new_action = first_tabs, second_tabs, -0.5, 'closed'
elif action == 'closed':
values_tab, index_tab, offset, new_action = second_tabs, first_tabs, 0.5, 'opened'
values_tab.index = index_tab.index + offset
values_tab['Time'] = index_tab['Time'].to_numpy()
values_tab['tab_action'] = new_action
all_values.append(values_tab)
last_action = df.tail(1)
if last_action['tab_action'].iat[0] == 'opened':
last_action.index += 0.5
last_action['tab_action'] = 'closed'
all_values.append(last_action)
return pd.concat(all_values)
# Add new rows at the correct positions.
complete_df = pd.concat([df, df.dropna(subset='tab_action').groupby(['user_id'], as_index=False).apply(get_new_rows).droplevel(0)]).sort_index().reset_index(drop=True)
# Fix the description
fix_m = complete_df['tab_name'].notna()
complete_df.loc[fix_m, 'Description'] = ('Tab ' + complete_df.loc[fix_m, 'tab_name'] +
' of type ' + complete_df.loc[fix_m, 'tab_type'] +
' ' + complete_df.loc[fix_m, 'tab_action'] + ' by ' +
complete_df.loc[fix_m, 'user_id'])
# Drop utility columns.
complete_df = complete_df.drop(columns=['tab_name', 'tab_type', 'tab_action', 'user_id'])
Time User Description
0 27.10.2021 15:58:00 UserA@gmail.com Tab Alpha of type PARTSTUDIO opened by User A
1 27.10.2021 15:59:00 UserA@gmail.com Start edit of part studio feature
2 27.10.2021 15:59:00 UserA@gmail.com Cancel Operation
3 27.10.2021 15:59:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
4 27.10.2021 15:59:00 UserB@gmail.com Start edit of part studio feature
5 27.10.2021 16:03:00 UserB@gmail.com Cancel Operation
6 27.10.2021 16:03:00 UserA@gmail.com Add assembly feature
7 27.10.2021 16:03:00 UserA@gmail.com Tab Alpha of type PARTSTUDIO closed by User A
8 27.10.2021 16:03:00 UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A
9 27.10.2021 16:03:00 UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A
10 27.10.2021 16:03:00 UserA@gmail.com Tab Gamma of type PARTSTUDIO opened by User A
11 27.10.2021 16:14:00 UserA@gmail.com Tab Gamma of type PARTSTUDIO closed by User A
12 27.10.2021 16:14:00 UserA@gmail.com Tab Beta of type PARTSTUDIO opened by User A
13 27.10.2021 16:15:00 UserA@gmail.com Start edit of part studio feature
14 27.10.2021 16:15:00 UserB@gmail.com Start edit of part studio feature
15 27.10.2021 16:15:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
16 27.10.2021 16:15:00 UserB@gmail.com Tab Delta of type PARTSTUDIO opened by User B
17 27.10.2021 16:54:00 UserB@gmail.com Add assembly feature
18 27.10.2021 16:55:00 UserA@gmail.com Tab Beta of type PARTSTUDIO closed by User A
19 27.10.2021 16:55:00 UserB@gmail.com Start edit of part studio feature
20 27.10.2021 16:55:00 UserB@gmail.com Tab Delta of type PARTSTUDIO closed by User B
21 27.10.2021 16:55:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO opened by User B
22 27.10.2021 16:56:00 UserB@gmail.com Tab Alpha of type PARTSTUDIO closed by User B
23 27.10.2021 16:56:00 UserB@gmail.com Tab Beta of type PARTSTUDIO opened by User B
24 27.10.2021 16:57:00 UserB@gmail.com Tab Beta of type PARTSTUDIO closed by User B
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.