简体   繁体   中英

Convert row values to columns to assign date field to each new column using python pandas

I am trying to convert rows with two different values to columns with dates captured under each column.

My Dataframe looks like this. Eventype 1.0 is my start date for a particular Network node & its consecutive EventType value=5 is my end date. Hence i would like to convert Eventype values to columns to find out the start date & end date.

EventID NetworkNode EventTime   EventType
1140085 606.0   2018-09-12 14:11:00 1.0
1140416 606.0   2018-09-12 16:39:00 5.0
1141105 606.0   2018-09-12 22:16:00 1.0
1141109 606.0   2018-09-12 22:19:00 5.0
1141288 421.0   2018-09-12 23:21:00 5.0
1141295 508.0   2018-09-12 23:23:00 5.0
1141568 647.0   2018-10-12 01:09:00 1.0
1141578 647.0   2018-10-12 01:12:00 5.0
1142463 461.0   2018-10-12 05:52:00 1.0
1142467 460.0   2018-10-12 05:53:00 1.0
1142468 502.0   2018-10-12 05:54:00 1.0
1142476 502.0   2018-10-12 05:57:00 5.0
1142493 461.0   2018-10-12 06:00:00 5.0
1142516 460.0   2018-10-12 06:01:00 5.0
1145299 629.0   2018-10-12 21:13:00 1.0
1145411 629.0   2018-10-12 22:16:00 5.0
1145414 629.0   2018-10-12 22:23:00 1.0
1145437 629.0   2018-10-12 22:26:00 5.0
1145437 421.0   2018-10-12 22:26:00 5.0


df = df[['EventID','NetworkNode', 'EventTime', 'EventType']].sort_values(by=['EventID'])

df = df.set_index(['NetworkNode','EventType'])['EventTime'].unstack()

I tried this code but gives error,

"ValueError: Index contains duplicate entries, cannot reshape" as Network Node has duplicate.

My desired Output should be something like this.

Value "1.0" in EventType column represents the start date & time of that event for that NetworkNode and the successive value "5.0" for the same NetworkNode will be the end time. Therfore i would like to convert these 2s row into 1 single row by its start & end time.

NetworkNode   1.0                      5.0
606.0       2018-09-12 14:11:00     2018-09-12 16:39:00
606.0           2018-09-12 22:16:00     2018-09-12 22:19:00
421.0           2018-09-12 23:21:00 2018-10-12 23:26:00
508.0                               2018-09-12 23:23:00
647.0           2018-10-12 01:09:00 2018-10-12 01:12:00
461.0           2018-10-12 05:52:00 2018-10-12 06:00:00
460.0           2018-10-12 05:53:00 2018-10-12 06:01:00
502.0           2018-10-12 05:54:00 2018-10-12 05:57:00
629.0           2018-10-12 21:13:00 2018-10-12 22:16:00
629.0           2018-10-12 22:23:00 2018-10-12 22:26:00

Please advise....

So far that I can answer

The main problem is that: to form Pivot Table like that, you require Unique Index, Index cannot be duplicated, so here I have 2 Options I can share with you

1) Concatenate the EventID and NetworkNode together to make it a unique Index, and form a pivot table

data = pd.read_csv(path, encoding="ISO-8859-1")
data_cp = data.copy()
data["Node_ID"] = ""
for x in range(len(data)):
    data["Node_ID"][x] = str(data["NetworkNode"][x]) + "_" + str(data["EventID"][x])
data.pivot(index='Node_ID', columns='EventType', values='EventTime')

Result be like this when run: 索引是 NodeID

2) Simply do Groupby using these 2 Key -> NetworkNode and EventType (Don't need to make it an index)

data_cp = data.copy()
data_cp.drop(columns=["EventID"], inplace=True)
view = data_cp.groupby(by=['NetworkNode','EventType'])["EventTime"]
view.first()

按 2 列分组

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM