[英]How to apply sort by ascending order for datetime in every group in pandas
我有一個數據集要根據'user_id'
和'contest_id'
進行分組,其中,我必須按日期和時間'contest_id'
到大對每個比賽中進入比賽的每個用戶進行排序。
我嘗試過先根據contest_id
和user handle
對數據進行分組,然后在將datetime
列轉換為“ to_datetime”后,嘗試使用sort_values對日期進行升序排序
當我嘗試保存代碼時出現錯誤'''
Excel doesn't support timezones in datetimes. Set the tzinfo in the
datetime/time object to None or use the 'remove_timezone' Workbook()
option
'''
dftotal.groupby(["contestID", "userHandle"])
dftotal["registerDateTime"]=pd.to_datetime(dftotal.registerDateTime)
dftotal["RegistrationDateTime"] = dftotal["registerDateTime"]
dftotal["submitDateTime"] = pd.to_datetime(dftotal.submitDateTime)
dftotal["SubmissionDateTime"] = dftotal["submitDateTime"]
dftotal = dftotal.sort_values(by=['RegistrationDateTime'])
數據是
contest_id user_id registration submission score
1234 abc 2012-01-09 2012-01-09 90
21:51:00+00:00 22:51:00+00:00
4489 pabc 2013-01-09 2013-01-09 39
21:51:00+00:00 22:55:00+00:00
1234 tiop 2012-01-09 2012-01-09 100
23:51:00+00:00 23:55:00+00:00
4489 pabceu 2013-01-09 2013-01-09 39
23:20:00+00:00 23:55:00+00:00
預期是
contest_id user_id registration submission score
1234 abc 2012-01-09 2012-01-09 90
21:51:00+00:00 22:51:00+00:00
1234 tiop 2012-01-09 2012-01-09 100
23:51:00+00:00 23:55:00+00:00
4489 pabc 2013-01-09 2013-01-09 39
21:51:00+00:00 22:55:00+00:00
4489 pabceu 2013-01-09 2013-01-09 39
23:20:00+00:00 23:55:00+00:00
我終於可以復制和修復了。
import pandas as pd
import io
t = '''contest_id user_id registration submission score
1234 abc 2012-01-09 21:51:00+00:00 2012-01-09 22:51:00+00:00 90
4489 pabc 2013-01-09 21:51:00+00:00 2013-01-09 22:55:00+00:00 39
1234 tiop 2012-01-09 23:51:00+00:00 2012-01-09 23:55:00+00:00 100
4489 pabceu 2013-01-09 23:20:00+00:00 2013-01-09 23:55:00+00:00 39'''
dftotal=pd.read_csv(io.StringIO(t), sep=r'\s\s+', engine='python')
print(dftotal.to_string())
dftotal['registration'] = pd.to_datetime(dftotal.registration, utc=True)
dftotal['submission'] = pd.to_datetime(dftotal.submission, utc=True)
print(dftotal.to_string())
dftotal.to_excel('contest_new.xlsx')
顯示:
contest_id user_id registration submission score
0 1234 abc 2012-01-09 21:51:00+00:00 2012-01-09 22:51:00+00:00 90
1 4489 pabc 2013-01-09 21:51:00+00:00 2013-01-09 22:55:00+00:00 39
2 1234 tiop 2012-01-09 23:51:00+00:00 2012-01-09 23:55:00+00:00 100
3 4489 pabceu 2013-01-09 23:20:00+00:00 2013-01-09 23:55:00+00:00 39
contest_id user_id registration submission score
0 1234 abc 2012-01-09 21:51:00+00:00 2012-01-09 22:51:00+00:00 90
2 1234 tiop 2012-01-09 23:51:00+00:00 2012-01-09 23:55:00+00:00 100
1 4489 pabc 2013-01-09 21:51:00+00:00 2013-01-09 22:55:00+00:00 39
3 4489 pabceu 2013-01-09 23:20:00+00:00 2013-01-09 23:55:00+00:00 39
並提出:
TypeError:Excel在日期時間中不支持時區。 將datetime / time對象中的tzinfo設置為None或使用'remove_timezone'Workbook()選項
使用openpyxl:
xlsxwriter后端引發此錯誤。 如果已安裝openpyxl,則足以要求該引擎:
... dftotal.to_excel('contest_new.xlsx', engine='openpyxl')
它會自動刪除tz信息並正確寫入excel文件
明確刪除ts信息:
可以使用tz_localize(None)
明確刪除時區信息:
... dftotal['registration'] = pd.to_datetime(dftotal.registration).dt.tz_localize(None) dftotal['submission'] = pd.to_datetime(dftotal.submission).dt.tz_localize(None) dftotal = dftotal.sort_values(by=['registration']) print(dftotal.to_string()) dftotal.to_excel('contest_new.xlsx')
數據框顯示為:
contest_id user_id registration submission score 0 1234 abc 2012-01-09 21:51:00 2012-01-09 22:51:00 90 2 1234 tiop 2012-01-09 23:51:00 2012-01-09 23:55:00 100 1 4489 pabc 2013-01-09 21:51:00 2013-01-09 22:55:00 39 3 4489 pabceu 2013-01-09 23:20:00 2013-01-09 23:55:00 39
並由默認的xlsxwriter引擎正確寫入。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.