将列转换为行以创建事件日志数据集

Question

could you please help me to transpose columns into rows to create event log series?你能帮我把列转换成行来创建事件日志系列吗？

I want to create an event log data set out of the following columns.我想从以下列创建一个事件日志数据集。

My table looks like the following:我的表如下所示：

ID1     ID2      Event1             Event1_activity     Event2              Event2_activity     Event3              Event3_activity
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
20001B  8793    03.09.2019 09:45    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  9017    03.09.2019 09:49    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  5454    04.09.2019 12:42    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description

According to ID1 and ID2 , I want to create a series of event logs based on the columns with respective events and activities.根据ID1和ID2 ，我想根据具有相应事件和活动的列创建一系列事件日志。

Basically my event log table should look like the following:基本上我的事件日志表应该如下所示：

ID          Event               Activity
6456-10001A 05.09.2019 12:32    Event1_Description
6456-10001A 09.09.2019 12:40    Event2_Description
6456-10001A 10.09.2019 12:40    Event3_Description
6456-10001A 05.09.2019 12:32    Event1_Description
6456-10001A 09.09.2019 12:40    Event2_Description
6456-10001A 10.09.2019 12:40    Event3_Description
8793-20001B 03.09.2019 09:45    Event1_Description
8793-20001B 10.09.2019 12:25    Event2_Description
8793-20001B 04.09.2019 09:45    Event3_Description
9017-20001B 03.09.2019 09:49    Event1_Description
9017-20001B 10.09.2019 12:25    Event2_Description
9017-20001B 04.09.2019 09:49    Event3_Description
5454-20001B 04.09.2019 12:42    Event1_Description
5454-20001B 10.09.2019 12:25    Event2_Description
5454-20001B 05.09.2019 12:42    Event3_Description

Any suggestions woud higly be appreciated!任何建议都将不胜感激！

Answer 1

df df

You can create the new ID and then concatenate the dataframe subsets and sort by ID您可以创建新ID ，然后连接 dataframe 子集并按 ID 排序

df['ID'] = df['ID2'].astype(str) + '-' + df['ID1']
n_events = 3
pd.concat([df[['ID', f'Event{i}', f'Event{i}_activity']].rename(columns={f'Event{i}': 'Event', f'Event{i}_activity': 'Activity'}) 
           for i in range(1, n_events+1)]
         ).sort_values(by='ID').reset_index(drop=True)

        ID           Event             Activity
0   5454-20001B 04.09.2019 12:42    Event1_Description
1   5454-20001B 10.09.2019 12:25    Event2_Description
2   5454-20001B 11.09.2019 12:25    Event3_Description
3   6456-10001A 05.09.2019 12:32    Event1_Description
4   6456-10001A 05.09.2019 12:32    Event1_Description
5   6456-10001A 09.09.2019 12:40    Event2_Description
6   6456-10001A 09.09.2019 12:40    Event2_Description
7   6456-10001A 10.09.2019 12:40    Event3_Description
8   6456-10001A 10.09.2019 12:40    Event3_Description
9   8793-20001B 03.09.2019 09:45    Event1_Description
10  8793-20001B 10.09.2019 12:25    Event2_Description
11  8793-20001B 11.09.2019 12:25    Event3_Description
12  9017-20001B 03.09.2019 09:49    Event1_Description
13  9017-20001B 10.09.2019 12:25    Event2_Description
14  9017-20001B 11.09.2019 12:25    Event3_Description

If you have to retain the original order of ID , then you have to do differently如果您必须保留ID的原始顺序，那么您必须做不同的事情

Answer 2

Using melt .使用熔体。 Dynamic - more columns (>3) will still work.动态 - 更多列 (>3) 仍然有效。

df = pd.read_csv(io.StringIO("""ID1     ID2      Event1             Event1_activity     Event2              Event2_activity     Event3              Event3_activity
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
20001B  8793    03.09.2019 09:45    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  9017    03.09.2019 09:49    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  5454    04.09.2019 12:42    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description"""
                            ), sep="\s\s+", engine="python")

# pepare ID column as concatenation
df = df.assign(ID=lambda dfa: dfa["ID1"].astype(str)+"-"+dfa["ID2"].astype(str)).drop(columns=["ID1","ID2"])
# melt out both sets of columns for Event and Activity then merge
# NB reset_index() to ensure merge key works.  Plus only want ID on LHS dataframe
df2 = pd.merge(
    pd.melt(df, id_vars=["ID"], 
            value_vars=[c for c in df.columns if "Event" in c and "activity" not in c], 
            value_name="Event").drop(columns="variable").reset_index(),
    pd.melt(df, id_vars=["ID"], 
            value_vars=[c for c in df.columns if "activity" in c], 
            value_name="Activity").drop(columns=["variable","ID"]).reset_index(),

    on="index"
).drop(columns="index").sort_values(["ID","Event"])

output output

         ID            Event           Activity
10001A-6456 05.09.2019 12:32 Event1_Description
10001A-6456 05.09.2019 12:32 Event1_Description
10001A-6456 09.09.2019 12:40 Event2_Description
10001A-6456 09.09.2019 12:40 Event2_Description
10001A-6456 10.09.2019 12:40 Event3_Description
10001A-6456 10.09.2019 12:40 Event3_Description
20001B-5454 04.09.2019 12:42 Event1_Description
20001B-5454 10.09.2019 12:25 Event2_Description
20001B-5454 11.09.2019 12:25 Event3_Description
20001B-8793 03.09.2019 09:45 Event1_Description
20001B-8793 10.09.2019 12:25 Event2_Description
20001B-8793 11.09.2019 12:25 Event3_Description
20001B-9017 03.09.2019 09:49 Event1_Description
20001B-9017 10.09.2019 12:25 Event2_Description
20001B-9017 11.09.2019 12:25 Event3_Description

Answer 3

Use wide_to_long with create ID column and swapping columns names like Event1_activity to activity_Event1 :将wide_to_long与create ID列一起使用，并将Event1_activity之类的列名称交换为activity_Event1 ：

df['ID']  = df.pop("ID1").astype(str) + "-" + df.pop("ID2").astype(str))

df.columns = [f'{x[1]}_{x[0]}' if len(x) == 2 else f'{"".join(x)}' 
                for x in df.columns.str.split('_')]

df = (pd.wide_to_long(df.reset_index(),
                      stubnames=['Event','activity_Event'],
                      i=['index','ID'],
                      j='tmp')
        .reset_index(level=1).reset_index(drop=True))
print (df) 
            ID             Event      activity_Event
0   10001A-6456  05.09.2019 12:32  Event1_Description
1   10001A-6456  09.09.2019 12:40  Event2_Description
2   10001A-6456  10.09.2019 12:40  Event3_Description
3   10001A-6456  05.09.2019 12:32  Event1_Description
4   10001A-6456  09.09.2019 12:40  Event2_Description
5   10001A-6456  10.09.2019 12:40  Event3_Description
6   20001B-8793  03.09.2019 09:45  Event1_Description
7   20001B-8793  10.09.2019 12:25  Event2_Description
8   20001B-8793  11.09.2019 12:25  Event3_Description
9   20001B-9017  03.09.2019 09:49  Event1_Description
10  20001B-9017  10.09.2019 12:25  Event2_Description
11  20001B-9017  11.09.2019 12:25  Event3_Description
12  20001B-5454  04.09.2019 12:42  Event1_Description
13  20001B-5454  10.09.2019 12:25  Event2_Description
14  20001B-5454  11.09.2019 12:25  Event3_Description

Answer 4

The reshaping process could be abstracted by using the pivot_longer function from pyjanitor ;重塑过程可以通过使用pyjanitor的pivot_longer function 来抽象； at the moment you have to install the latest development version from github :目前你必须从github安装最新的开发版本：

Your columns have a pattern - some end with numbers, while the rest end with activity .您的列有一个模式 - 有些以数字结尾，而 rest 以activity结尾。 We can use a regular expression inside the pivot_longer function to get your results:我们可以在pivot_longer function 中使用正则表达式来获取结果：

# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
 import janitor

(   # combine `ID1` and `ID2` into a single column
    df.assign(ID=df.ID2.astype(str).str.cat(df.ID1, sep="-"))
    .drop(columns=["ID1", "ID2"])
    .pivot_longer(
        index="ID",
        names_to=("Event", "Activity"),
        names_pattern=("\d$", "activity$"),
        sort_by_appearance=True,
    )
)

         ID               Event            Activity
0   6456-10001A     05.09.2019 12:32    Event1_Description
1   6456-10001A     09.09.2019 12:40    Event2_Description
2   6456-10001A     10.09.2019 12:40    Event3_Description
3   6456-10001A     05.09.2019 12:32    Event1_Description
4   6456-10001A     09.09.2019 12:40    Event2_Description
5   6456-10001A     10.09.2019 12:40    Event3_Description
6   8793-20001B     03.09.2019 09:45    Event1_Description
7   8793-20001B     10.09.2019 12:25    Event2_Description
8   8793-20001B     11.09.2019 12:25    Event3_Description
9   9017-20001B     03.09.2019 09:49    Event1_Description
10  9017-20001B     10.09.2019 12:25    Event2_Description
11  9017-20001B     11.09.2019 12:25    Event3_Description
12  5454-20001B     04.09.2019 12:42    Event1_Description
13  5454-20001B     10.09.2019 12:25    Event2_Description
14  5454-20001B     11.09.2019 12:25    Event3_Description

The names_pattern ("\d$", "activity$") looks for the columns that end with number and activity and assigns them to the respective column names in names_to ("Event", "Activity") names_pattern ("\d$", "activity$")查找以数字和activity结尾的列，并将它们分配给names_to ("Event", "Activity")中的相应列名

将列转换为行以创建事件日志数据集

问题描述

4 个解决方案

解决方案1
0 2021-01-15 10:49:43

解决方案2
0 已采纳 2021-01-15 10:55:17

output output

解决方案3
0 2021-01-15 11:27:08

解决方案4
0 2021-01-18 12:58:01

将列转换为行以创建事件日志数据集

问题描述

4 个解决方案

解决方案1 0 2021-01-15 10:49:43

解决方案2 0 已采纳 2021-01-15 10:55:17

output output

解决方案3 0 2021-01-15 11:27:08

解决方案4 0 2021-01-18 12:58:01

解决方案1
0 2021-01-15 10:49:43

解决方案2
0 已采纳 2021-01-15 10:55:17

解决方案3
0 2021-01-15 11:27:08

解决方案4
0 2021-01-18 12:58:01