I am trying to convert rows with two different values to columns with dates captured under each column.
My Dataframe looks like this. Eventype 1.0 is my start date for a particular Network node & its consecutive EventType value=5 is my end date. Hence i would like to convert Eventype values to columns to find out the start date & end date.
EventID NetworkNode EventTime EventType
1140085 606.0 2018-09-12 14:11:00 1.0
1140416 606.0 2018-09-12 16:39:00 5.0
1141105 606.0 2018-09-12 22:16:00 1.0
1141109 606.0 2018-09-12 22:19:00 5.0
1141288 421.0 2018-09-12 23:21:00 5.0
1141295 508.0 2018-09-12 23:23:00 5.0
1141568 647.0 2018-10-12 01:09:00 1.0
1141578 647.0 2018-10-12 01:12:00 5.0
1142463 461.0 2018-10-12 05:52:00 1.0
1142467 460.0 2018-10-12 05:53:00 1.0
1142468 502.0 2018-10-12 05:54:00 1.0
1142476 502.0 2018-10-12 05:57:00 5.0
1142493 461.0 2018-10-12 06:00:00 5.0
1142516 460.0 2018-10-12 06:01:00 5.0
1145299 629.0 2018-10-12 21:13:00 1.0
1145411 629.0 2018-10-12 22:16:00 5.0
1145414 629.0 2018-10-12 22:23:00 1.0
1145437 629.0 2018-10-12 22:26:00 5.0
1145437 421.0 2018-10-12 22:26:00 5.0
df = df[['EventID','NetworkNode', 'EventTime', 'EventType']].sort_values(by=['EventID'])
df = df.set_index(['NetworkNode','EventType'])['EventTime'].unstack()
I tried this code but gives error,
"ValueError: Index contains duplicate entries, cannot reshape" as Network Node has duplicate.
My desired Output should be something like this.
Value "1.0" in EventType column represents the start date & time of that event for that NetworkNode and the successive value "5.0" for the same NetworkNode will be the end time. Therfore i would like to convert these 2s row into 1 single row by its start & end time.
NetworkNode 1.0 5.0
606.0 2018-09-12 14:11:00 2018-09-12 16:39:00
606.0 2018-09-12 22:16:00 2018-09-12 22:19:00
421.0 2018-09-12 23:21:00 2018-10-12 23:26:00
508.0 2018-09-12 23:23:00
647.0 2018-10-12 01:09:00 2018-10-12 01:12:00
461.0 2018-10-12 05:52:00 2018-10-12 06:00:00
460.0 2018-10-12 05:53:00 2018-10-12 06:01:00
502.0 2018-10-12 05:54:00 2018-10-12 05:57:00
629.0 2018-10-12 21:13:00 2018-10-12 22:16:00
629.0 2018-10-12 22:23:00 2018-10-12 22:26:00
Please advise....
So far that I can answer
The main problem is that: to form Pivot Table like that, you require Unique Index, Index cannot be duplicated, so here I have 2 Options I can share with you
1) Concatenate the EventID and NetworkNode together to make it a unique Index, and form a pivot table
data = pd.read_csv(path, encoding="ISO-8859-1")
data_cp = data.copy()
data["Node_ID"] = ""
for x in range(len(data)):
data["Node_ID"][x] = str(data["NetworkNode"][x]) + "_" + str(data["EventID"][x])
data.pivot(index='Node_ID', columns='EventType', values='EventTime')
2) Simply do Groupby using these 2 Key -> NetworkNode and EventType (Don't need to make it an index)
data_cp = data.copy()
data_cp.drop(columns=["EventID"], inplace=True)
view = data_cp.groupby(by=['NetworkNode','EventType'])["EventTime"]
view.first()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.