简体   繁体   English

将列转换为行以创建事件日志数据集

[英]Transposing columns into rows to create event log data set

could you please help me to transpose columns into rows to create event log series?你能帮我把列转换成行来创建事件日志系列吗?

I want to create an event log data set out of the following columns.我想从以下列创建一个事件日志数据集。

My table looks like the following:我的表如下所示:

ID1     ID2      Event1             Event1_activity     Event2              Event2_activity     Event3              Event3_activity
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
20001B  8793    03.09.2019 09:45    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  9017    03.09.2019 09:49    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  5454    04.09.2019 12:42    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description

According to ID1 and ID2 , I want to create a series of event logs based on the columns with respective events and activities.根据ID1ID2 ,我想根据具有相应事件和活动的列创建一系列事件日志。

Basically my event log table should look like the following:基本上我的事件日志表应该如下所示:

ID          Event               Activity
6456-10001A 05.09.2019 12:32    Event1_Description
6456-10001A 09.09.2019 12:40    Event2_Description
6456-10001A 10.09.2019 12:40    Event3_Description
6456-10001A 05.09.2019 12:32    Event1_Description
6456-10001A 09.09.2019 12:40    Event2_Description
6456-10001A 10.09.2019 12:40    Event3_Description
8793-20001B 03.09.2019 09:45    Event1_Description
8793-20001B 10.09.2019 12:25    Event2_Description
8793-20001B 04.09.2019 09:45    Event3_Description
9017-20001B 03.09.2019 09:49    Event1_Description
9017-20001B 10.09.2019 12:25    Event2_Description
9017-20001B 04.09.2019 09:49    Event3_Description
5454-20001B 04.09.2019 12:42    Event1_Description
5454-20001B 10.09.2019 12:25    Event2_Description
5454-20001B 05.09.2019 12:42    Event3_Description

Any suggestions woud higly be appreciated!任何建议都将不胜感激!

df df 在此处输入图像描述

You can create the new ID and then concatenate the dataframe subsets and sort by ID您可以创建新ID ,然后连接 dataframe 子集并按 ID 排序

df['ID'] = df['ID2'].astype(str) + '-' + df['ID1']
n_events = 3
pd.concat([df[['ID', f'Event{i}', f'Event{i}_activity']].rename(columns={f'Event{i}': 'Event', f'Event{i}_activity': 'Activity'}) 
           for i in range(1, n_events+1)]
         ).sort_values(by='ID').reset_index(drop=True)

        ID           Event             Activity
0   5454-20001B 04.09.2019 12:42    Event1_Description
1   5454-20001B 10.09.2019 12:25    Event2_Description
2   5454-20001B 11.09.2019 12:25    Event3_Description
3   6456-10001A 05.09.2019 12:32    Event1_Description
4   6456-10001A 05.09.2019 12:32    Event1_Description
5   6456-10001A 09.09.2019 12:40    Event2_Description
6   6456-10001A 09.09.2019 12:40    Event2_Description
7   6456-10001A 10.09.2019 12:40    Event3_Description
8   6456-10001A 10.09.2019 12:40    Event3_Description
9   8793-20001B 03.09.2019 09:45    Event1_Description
10  8793-20001B 10.09.2019 12:25    Event2_Description
11  8793-20001B 11.09.2019 12:25    Event3_Description
12  9017-20001B 03.09.2019 09:49    Event1_Description
13  9017-20001B 10.09.2019 12:25    Event2_Description
14  9017-20001B 11.09.2019 12:25    Event3_Description

If you have to retain the original order of ID , then you have to do differently如果您必须保留ID的原始顺序,那么您必须做不同的事情

Using melt .使用 熔体 Dynamic - more columns (>3) will still work.动态 - 更多列 (>3) 仍然有效。

df = pd.read_csv(io.StringIO("""ID1     ID2      Event1             Event1_activity     Event2              Event2_activity     Event3              Event3_activity
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
20001B  8793    03.09.2019 09:45    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  9017    03.09.2019 09:49    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  5454    04.09.2019 12:42    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description"""
                            ), sep="\s\s+", engine="python")

# pepare ID column as concatenation
df = df.assign(ID=lambda dfa: dfa["ID1"].astype(str)+"-"+dfa["ID2"].astype(str)).drop(columns=["ID1","ID2"])
# melt out both sets of columns for Event and Activity then merge
# NB reset_index() to ensure merge key works.  Plus only want ID on LHS dataframe
df2 = pd.merge(
    pd.melt(df, id_vars=["ID"], 
            value_vars=[c for c in df.columns if "Event" in c and "activity" not in c], 
            value_name="Event").drop(columns="variable").reset_index(),
    pd.melt(df, id_vars=["ID"], 
            value_vars=[c for c in df.columns if "activity" in c], 
            value_name="Activity").drop(columns=["variable","ID"]).reset_index(),

    on="index"
).drop(columns="index").sort_values(["ID","Event"])

output output

         ID            Event           Activity
10001A-6456 05.09.2019 12:32 Event1_Description
10001A-6456 05.09.2019 12:32 Event1_Description
10001A-6456 09.09.2019 12:40 Event2_Description
10001A-6456 09.09.2019 12:40 Event2_Description
10001A-6456 10.09.2019 12:40 Event3_Description
10001A-6456 10.09.2019 12:40 Event3_Description
20001B-5454 04.09.2019 12:42 Event1_Description
20001B-5454 10.09.2019 12:25 Event2_Description
20001B-5454 11.09.2019 12:25 Event3_Description
20001B-8793 03.09.2019 09:45 Event1_Description
20001B-8793 10.09.2019 12:25 Event2_Description
20001B-8793 11.09.2019 12:25 Event3_Description
20001B-9017 03.09.2019 09:49 Event1_Description
20001B-9017 10.09.2019 12:25 Event2_Description
20001B-9017 11.09.2019 12:25 Event3_Description

Use wide_to_long with create ID column and swapping columns names like Event1_activity to activity_Event1 :wide_to_long与create ID列一起使用,并将Event1_activity之类的列名称交换为activity_Event1

df['ID']  = df.pop("ID1").astype(str) + "-" + df.pop("ID2").astype(str))

df.columns = [f'{x[1]}_{x[0]}' if len(x) == 2 else f'{"".join(x)}' 
                for x in df.columns.str.split('_')]

df = (pd.wide_to_long(df.reset_index(),
                      stubnames=['Event','activity_Event'],
                      i=['index','ID'],
                      j='tmp')
        .reset_index(level=1).reset_index(drop=True))
print (df) 
            ID             Event      activity_Event
0   10001A-6456  05.09.2019 12:32  Event1_Description
1   10001A-6456  09.09.2019 12:40  Event2_Description
2   10001A-6456  10.09.2019 12:40  Event3_Description
3   10001A-6456  05.09.2019 12:32  Event1_Description
4   10001A-6456  09.09.2019 12:40  Event2_Description
5   10001A-6456  10.09.2019 12:40  Event3_Description
6   20001B-8793  03.09.2019 09:45  Event1_Description
7   20001B-8793  10.09.2019 12:25  Event2_Description
8   20001B-8793  11.09.2019 12:25  Event3_Description
9   20001B-9017  03.09.2019 09:49  Event1_Description
10  20001B-9017  10.09.2019 12:25  Event2_Description
11  20001B-9017  11.09.2019 12:25  Event3_Description
12  20001B-5454  04.09.2019 12:42  Event1_Description
13  20001B-5454  10.09.2019 12:25  Event2_Description
14  20001B-5454  11.09.2019 12:25  Event3_Description

The reshaping process could be abstracted by using the pivot_longer function from pyjanitor ;重塑过程可以通过使用pyjanitorpivot_longer function 来抽象; at the moment you have to install the latest development version from github :目前你必须从github安装最新的开发版本:

Your columns have a pattern - some end with numbers, while the rest end with activity .您的列有一个模式 - 有些以数字结尾,而 rest 以activity结尾。 We can use a regular expression inside the pivot_longer function to get your results:我们可以在pivot_longer function 中使用正则表达式来获取结果:

# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
 import janitor

(   # combine `ID1` and `ID2` into a single column
    df.assign(ID=df.ID2.astype(str).str.cat(df.ID1, sep="-"))
    .drop(columns=["ID1", "ID2"])
    .pivot_longer(
        index="ID",
        names_to=("Event", "Activity"),
        names_pattern=("\d$", "activity$"),
        sort_by_appearance=True,
    )
)

         ID               Event            Activity
0   6456-10001A     05.09.2019 12:32    Event1_Description
1   6456-10001A     09.09.2019 12:40    Event2_Description
2   6456-10001A     10.09.2019 12:40    Event3_Description
3   6456-10001A     05.09.2019 12:32    Event1_Description
4   6456-10001A     09.09.2019 12:40    Event2_Description
5   6456-10001A     10.09.2019 12:40    Event3_Description
6   8793-20001B     03.09.2019 09:45    Event1_Description
7   8793-20001B     10.09.2019 12:25    Event2_Description
8   8793-20001B     11.09.2019 12:25    Event3_Description
9   9017-20001B     03.09.2019 09:49    Event1_Description
10  9017-20001B     10.09.2019 12:25    Event2_Description
11  9017-20001B     11.09.2019 12:25    Event3_Description
12  5454-20001B     04.09.2019 12:42    Event1_Description
13  5454-20001B     10.09.2019 12:25    Event2_Description
14  5454-20001B     11.09.2019 12:25    Event3_Description

The names_pattern ("\d$", "activity$") looks for the columns that end with number and activity and assigns them to the respective column names in names_to ("Event", "Activity") names_pattern ("\d$", "activity$")查找以数字和activity结尾的列,并将它们分配给names_to ("Event", "Activity")中的相应列名

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM