简体   繁体   English

如何根据特定条件在数据框中插入行?

[英]How to insert rows in dataframe based on specific condition?

I have a following dataframe:
Index指数 Time时间 User用户 Description描述
1 1 27.10.2021 15:58:00 27.10.2021 15:58:00 UserA@gmail.com用户A@gmail.com Tab Alpha of type PARTSTUDIO opened by User A用户 A 打开的 PARTSTUDIO 类型的选项卡 Alpha
2 2 27.10.2021 15:59:00 27.10.2021 15:59:00 UserA@gmail.com用户A@gmail.com Start edit of part studio feature开始编辑部分工作室功能
3 3 27.10.2021 15:59:00 27.10.2021 15:59:00 UserA@gmail.com用户A@gmail.com Cancel Operation取消操作
4 4 27.10.2021 15:59:00 27.10.2021 15:59:00 UserB@gmail.com用户B@gmail.com Tab Alpha of type PARTSTUDIO opened by User B用户 B 打开的 PARTSTUDIO 类型的选项卡 Alpha
5 5 27.10.2021 15:59:00 27.10.2021 15:59:00 UserB@gmail.com用户B@gmail.com Start edit of part studio feature开始编辑部分工作室功能
6 6 27.10.2021 16:03:00 27.10.2021 16:03:00 UserB@gmail.com用户B@gmail.com Cancel Operation取消操作
7 7 27.10.2021 16:03:00 27.10.2021 16:03:00 UserA@gmail.com用户A@gmail.com Add assembly feature添加装配功能
9 9 27.10.2021 16:03:00 27.10.2021 16:03:00 UserA@gmail.com用户A@gmail.com Tab Beta of type PARTSTUDIO opened by User A用户 A 打开的 PARTSTUDIO 类型的 Tab Beta
10 10 27.10.2021 16:15:00 27.10.2021 16:15:00 UserA@gmail.com用户A@gmail.com Start edit of part studio feature开始编辑部分工作室功能
11 11 27.10.2021 16:15:00 27.10.2021 16:15:00 UserB@gmail.com用户B@gmail.com Start edit of part studio feature开始编辑部分工作室功能
12 12 27.10.2021 16:15:00 27.10.2021 16:15:00 UserB@gmail.com用户B@gmail.com Tab Alpha of type PARTSTUDIO closed by User B用户 B 关闭的 PARTSTUDIO 类型的选项卡 Alpha
14 14 27.10.2021 16:54:00 27.10.2021 16:54:00 UserB@gmail.com用户B@gmail.com Add assembly feature添加装配功能
15 15 27.10.2021 16:55:00 27.10.2021 16:55:00 UserA@gmail.com用户A@gmail.com Tab Beta of type PARTSTUDIO closed by User A用户 A 关闭的 PARTSTUDIO 类型的选项卡 Beta
16 16 27.10.2021 16:55:00 27.10.2021 16:55:00 UserB@gmail.com用户B@gmail.com Start edit of part studio feature开始编辑部分工作室功能
17 17 27.10.2021 16:55:00 27.10.2021 16:55:00 UserB@gmail.com用户B@gmail.com Tab Delta of type PARTSTUDIO closed by User B用户 B 关闭的 PARTSTUDIO 类型的选项卡 Delta

Expected output:预期输出:

Index指数 Time时间 User用户 Description描述
1 1 27.10.2021 15:58:00 27.10.2021 15:58:00 UserA@gmail.com用户A@gmail.com Tab Alpha of type PARTSTUDIO opened by User A用户 A 打开的 PARTSTUDIO 类型的选项卡 Alpha
2 2 27.10.2021 15:59:00 27.10.2021 15:59:00 UserA@gmail.com用户A@gmail.com Start edit of part studio feature开始编辑部分工作室功能
3 3 27.10.2021 15:59:00 27.10.2021 15:59:00 UserA@gmail.com用户A@gmail.com Cancel Operation取消操作
4 4 27.10.2021 15:59:00 27.10.2021 15:59:00 UserB@gmail.com用户B@gmail.com Tab Alpha of type PARTSTUDIO opened by User B用户 B 打开的 PARTSTUDIO 类型的选项卡 Alpha
5 5 27.10.2021 15:59:00 27.10.2021 15:59:00 UserB@gmail.com用户B@gmail.com Start edit of part studio feature开始编辑部分工作室功能
6 6 27.10.2021 16:03:00 27.10.2021 16:03:00 UserB@gmail.com用户B@gmail.com Cancel Operation取消操作
7 7 27.10.2021 16:03:00 27.10.2021 16:03:00 UserA@gmail.com用户A@gmail.com Add assembly feature添加装配功能
8 8 27.10.2021 16:03:00 27.10.2021 16:03:00 UserA@gmail.com用户A@gmail.com Tab Alpha of type PARTSTUDIO closed by User A用户 A 关闭的 PARTSTUDIO 类型的选项卡 Alpha
9 9 27.10.2021 16:03:00 27.10.2021 16:03:00 UserA@gmail.com用户A@gmail.com Tab Beta of type PARTSTUDIO opened by User A用户 A 打开的 PARTSTUDIO 类型的 Tab Beta
10 10 27.10.2021 16:15:00 27.10.2021 16:15:00 UserA@gmail.com用户A@gmail.com Start edit of part studio feature开始编辑部分工作室功能
11 11 27.10.2021 16:15:00 27.10.2021 16:15:00 UserB@gmail.com用户B@gmail.com Start edit of part studio feature开始编辑部分工作室功能
12 12 27.10.2021 16:15:00 27.10.2021 16:15:00 UserB@gmail.com用户B@gmail.com Tab Alpha of type PARTSTUDIO closed by User B用户 B 关闭的 PARTSTUDIO 类型的选项卡 Alpha
13 13 27.10.2021 16:15:00 27.10.2021 16:15:00 UserB@gmail.com用户B@gmail.com Tab Delta of type PARTSTUDIO opened by User B用户 B 打开的 PARTSTUDIO 类型的选项卡 Delta
14 14 27.10.2021 16:54:00 27.10.2021 16:54:00 UserB@gmail.com用户B@gmail.com Add assembly feature添加装配功能
15 15 27.10.2021 16:55:00 27.10.2021 16:55:00 UserA@gmail.com用户A@gmail.com Tab Beta of type PARTSTUDIO closed by User A用户 A 关闭的 PARTSTUDIO 类型的选项卡 Beta
16 16 27.10.2021 16:55:00 27.10.2021 16:55:00 UserB@gmail.com用户B@gmail.com Start edit of part studio feature开始编辑部分工作室功能
17 17 27.10.2021 16:55:00 27.10.2021 16:55:00 UserB@gmail.com用户B@gmail.com Tab Delta of type PARTSTUDIO closed by User B用户 B 关闭的 PARTSTUDIO 类型的选项卡 Delta

How to iterate through dataframe and check if after each value "Tab x opened by User y" in the Description column, the "Tab x closed by User y" follows somewhere further in the dataframe?如何遍历数据框并检查在描述列中的每个值“用户 y打开的选项卡 x”之后,“用户 y关闭的选项卡 x”是否在数据框中的某个位置之后? If yes OK.如果是的话。 If not, if the "Tab zz opened by User A" follows, that means that "Tab x closed by User y" is missing and should be inserted a row before the "Tab zz opened by User A" value (example index 8).如果不是,如果“用户 A打开的选项卡 zz”跟在后面,则意味着“用户 y关闭的选项卡 x”缺失,应该在“用户 A打开的选项卡 zz”值之前插入一行(例如索引 8) . Same goes vice versa (index 13).反之亦然(索引 13)。 Is there a way to do this without df.iterrows?没有 df.iterrows 有没有办法做到这一点? Thanks in advance.提前致谢。

Sorry, I forgot to answer this.对不起,我忘了回答这个问题。

Here is one solution.这是一种解决方案。 Not really concise and particularly elegant, but should be faster than using iterrows for both modifying and checking future rows.不是很简洁也不是特别优雅,但应该比使用iterrows更快地修改和检查未来的行。

Data:数据:

                   Time             User                                    Description
0   27.10.2021 15:58:00  UserA@gmail.com  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  UserA@gmail.com              Start edit of part studio feature
2   27.10.2021 15:59:00  UserA@gmail.com                               Cancel Operation
3   27.10.2021 15:59:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  UserB@gmail.com              Start edit of part studio feature
5   27.10.2021 16:03:00  UserB@gmail.com                               Cancel Operation
6   27.10.2021 16:03:00  UserA@gmail.com                           Add assembly feature
7   27.10.2021 16:03:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
8   27.10.2021 16:03:00  UserA@gmail.com  Tab Gamma of type PARTSTUDIO opened by User A
9   27.10.2021 16:14:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
10  27.10.2021 16:15:00  UserA@gmail.com              Start edit of part studio feature
11  27.10.2021 16:15:00  UserB@gmail.com              Start edit of part studio feature
12  27.10.2021 16:15:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
13  27.10.2021 16:54:00  UserB@gmail.com                           Add assembly feature
14  27.10.2021 16:55:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO closed by User A
15  27.10.2021 16:55:00  UserB@gmail.com              Start edit of part studio feature
16  27.10.2021 16:55:00  UserB@gmail.com  Tab Delta of type PARTSTUDIO closed by User B
17  27.10.2021 16:56:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
18  27.10.2021 16:57:00  UserB@gmail.com   Tab Beta of type PARTSTUDIO closed by User B

I did add a couple of more open/close in a row for some more testing.我确实连续添加了几个打开/关闭以进行更多测试。

Code:代码:

# Pattern to extract action info.
pattern = r'^Tab (?P<tab_name>.+) of type (?P<tab_type>.+) (?P<tab_action>\bclosed\b|\bopened\b) by (?P<user_id>.+)$'

# Add utility columns.
df = pd.concat([df, df['Description'].str.extract(pattern)], axis=1)

# Get rows with tweaked index.
def get_new_rows(df):    
    all_values = []
    for action in ['opened', 'closed']:
        action_mask = df['tab_action'].eq(action)
        first_tabs = df[df['tab_action'].eq(df['tab_action'].shift(-1)) & action_mask]
        second_tabs = df[df['tab_action'].eq(df['tab_action'].shift(1)) & action_mask]
                
        if len(first_tabs) == 0:
            continue

        if action == 'opened':
            values_tab, index_tab, offset, new_action = first_tabs, second_tabs, -0.5, 'closed'
        elif action == 'closed':
            values_tab, index_tab, offset, new_action = second_tabs, first_tabs, 0.5, 'opened'

        values_tab.index = index_tab.index + offset
        values_tab['Time'] = index_tab['Time'].to_numpy()
        values_tab['tab_action'] = new_action
        all_values.append(values_tab)
    
    last_action = df.tail(1)
    if last_action['tab_action'].iat[0] == 'opened':
        last_action.index += 0.5
        last_action['tab_action'] = 'closed'
        all_values.append(last_action)
    
    return pd.concat(all_values)


# Add new rows at the correct positions.
complete_df = pd.concat([df, df.dropna(subset='tab_action').groupby(['user_id'], as_index=False).apply(get_new_rows).droplevel(0)]).sort_index().reset_index(drop=True)

# Fix the description
fix_m = complete_df['tab_name'].notna()
complete_df.loc[fix_m, 'Description'] = ('Tab ' + complete_df.loc[fix_m, 'tab_name'] + 
                                        ' of type ' + complete_df.loc[fix_m, 'tab_type'] +
                                        ' ' + complete_df.loc[fix_m, 'tab_action'] + ' by ' +
                                        complete_df.loc[fix_m, 'user_id']) 
# Drop utility columns.
complete_df = complete_df.drop(columns=['tab_name', 'tab_type', 'tab_action', 'user_id'])

Result:结果:

                   Time             User                                    Description
0   27.10.2021 15:58:00  UserA@gmail.com  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  UserA@gmail.com              Start edit of part studio feature
2   27.10.2021 15:59:00  UserA@gmail.com                               Cancel Operation
3   27.10.2021 15:59:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  UserB@gmail.com              Start edit of part studio feature
5   27.10.2021 16:03:00  UserB@gmail.com                               Cancel Operation
6   27.10.2021 16:03:00  UserA@gmail.com                           Add assembly feature
7   27.10.2021 16:03:00  UserA@gmail.com  Tab Alpha of type PARTSTUDIO closed by User A
8   27.10.2021 16:03:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
9   27.10.2021 16:03:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO closed by User A
10  27.10.2021 16:03:00  UserA@gmail.com  Tab Gamma of type PARTSTUDIO opened by User A
11  27.10.2021 16:14:00  UserA@gmail.com  Tab Gamma of type PARTSTUDIO closed by User A
12  27.10.2021 16:14:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO opened by User A
13  27.10.2021 16:15:00  UserA@gmail.com              Start edit of part studio feature
14  27.10.2021 16:15:00  UserB@gmail.com              Start edit of part studio feature
15  27.10.2021 16:15:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
16  27.10.2021 16:15:00  UserB@gmail.com  Tab Delta of type PARTSTUDIO opened by User B
17  27.10.2021 16:54:00  UserB@gmail.com                           Add assembly feature
18  27.10.2021 16:55:00  UserA@gmail.com   Tab Beta of type PARTSTUDIO closed by User A
19  27.10.2021 16:55:00  UserB@gmail.com              Start edit of part studio feature
20  27.10.2021 16:55:00  UserB@gmail.com  Tab Delta of type PARTSTUDIO closed by User B
21  27.10.2021 16:55:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO opened by User B
22  27.10.2021 16:56:00  UserB@gmail.com  Tab Alpha of type PARTSTUDIO closed by User B
23  27.10.2021 16:56:00  UserB@gmail.com   Tab Beta of type PARTSTUDIO opened by User B
24  27.10.2021 16:57:00  UserB@gmail.com   Tab Beta of type PARTSTUDIO closed by User B

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM