如何在熊貓中的每個組中按日期時間升序應用排序

Question

我有一個數據集要根據'user_id'和'contest_id'進行分組，其中，我必須按日期和時間'contest_id'到大對每個比賽中進入比賽的每個用戶進行排序。

我嘗試過先根據contest_id和user handle對數據進行分組，然后在將datetime列轉換為“ to_datetime”后，嘗試使用sort_values對日期進行升序排序

當我嘗試保存代碼時出現錯誤'''

      Excel doesn't support timezones in datetimes. Set the tzinfo in the 

       datetime/time object to None or use the 'remove_timezone' Workbook() 

       option

'''


       dftotal.groupby(["contestID", "userHandle"])

       dftotal["registerDateTime"]=pd.to_datetime(dftotal.registerDateTime)

       dftotal["RegistrationDateTime"] = dftotal["registerDateTime"]

       dftotal["submitDateTime"] = pd.to_datetime(dftotal.submitDateTime)

       dftotal["SubmissionDateTime"] = dftotal["submitDateTime"]

       dftotal = dftotal.sort_values(by=['RegistrationDateTime'])

數據是

        contest_id user_id  registration    submission          score 
        1234       abc     2012-01-09       2012-01-09           90
                          21:51:00+00:00   22:51:00+00:00 


        4489      pabc     2013-01-09     2013-01-09             39
                         21:51:00+00:00   22:55:00+00:00 

        1234     tiop      2012-01-09      2012-01-09            100
                        23:51:00+00:00   23:55:00+00:00 

        4489    pabceu    2013-01-09      2013-01-09              39
                        23:20:00+00:00   23:55:00+00:00

預期是

        contest_id user_id   registration     submission             score 
        1234       abc      2012-01-09       2012-01-09              90
                         21:51:00+00:00   22:51:00+00:00 

        1234      tiop    2012-01-09       2012-01-09               100
                        23:51:00+00:00    23:55:00+00:00 

        4489      pabc    2013-01-09      2013-01-09                 39
                        21:51:00+00:00   22:55:00+00:00 

        4489     pabceu   2013-01-09     2013-01-09                  39
                        23:20:00+00:00  23:55:00+00:00

Answer 1

我終於可以復制和修復了。

重現步驟

import pandas as pd
import io

t = '''contest_id  user_id  registration    submission          score 
        1234       abc     2012-01-09 21:51:00+00:00       2012-01-09 22:51:00+00:00           90
        4489      pabc     2013-01-09 21:51:00+00:00     2013-01-09 22:55:00+00:00             39
        1234     tiop      2012-01-09 23:51:00+00:00      2012-01-09 23:55:00+00:00            100
        4489    pabceu    2013-01-09 23:20:00+00:00      2013-01-09 23:55:00+00:00              39'''

dftotal=pd.read_csv(io.StringIO(t), sep=r'\s\s+', engine='python')

print(dftotal.to_string())

dftotal['registration'] = pd.to_datetime(dftotal.registration, utc=True)
dftotal['submission'] = pd.to_datetime(dftotal.submission, utc=True)

print(dftotal.to_string())
dftotal.to_excel('contest_new.xlsx')

顯示：

   contest_id user_id               registration                 submission  score
0        1234     abc  2012-01-09 21:51:00+00:00  2012-01-09 22:51:00+00:00     90
1        4489    pabc  2013-01-09 21:51:00+00:00  2013-01-09 22:55:00+00:00     39
2        1234    tiop  2012-01-09 23:51:00+00:00  2012-01-09 23:55:00+00:00    100
3        4489  pabceu  2013-01-09 23:20:00+00:00  2013-01-09 23:55:00+00:00     39
   contest_id user_id              registration                submission  score
0        1234     abc 2012-01-09 21:51:00+00:00 2012-01-09 22:51:00+00:00     90
2        1234    tiop 2012-01-09 23:51:00+00:00 2012-01-09 23:55:00+00:00    100
1        4489    pabc 2013-01-09 21:51:00+00:00 2013-01-09 22:55:00+00:00     39
3        4489  pabceu 2013-01-09 23:20:00+00:00 2013-01-09 23:55:00+00:00     39

並提出：

TypeError：Excel在日期時間中不支持時區。 將datetime / time對象中的tzinfo設置為None或使用'remove_timezone'Workbook（）選項

可能的修復：

使用openpyxl：
xlsxwriter后端引發此錯誤。 如果已安裝openpyxl，則足以要求該引擎：
```
 ... dftotal.to_excel('contest_new.xlsx', engine='openpyxl') 
```
它會自動刪除tz信息並正確寫入excel文件

明確刪除ts信息：

可以使用tz_localize(None)明確刪除時區信息：

 ... dftotal['registration'] = pd.to_datetime(dftotal.registration).dt.tz_localize(None) dftotal['submission'] = pd.to_datetime(dftotal.submission).dt.tz_localize(None) dftotal = dftotal.sort_values(by=['registration']) print(dftotal.to_string()) dftotal.to_excel('contest_new.xlsx')

數據框顯示為：

  contest_id user_id registration submission score 0 1234 abc 2012-01-09 21:51:00 2012-01-09 22:51:00 90 2 1234 tiop 2012-01-09 23:51:00 2012-01-09 23:55:00 100 1 4489 pabc 2013-01-09 21:51:00 2013-01-09 22:55:00 39 3 4489 pabceu 2013-01-09 23:20:00 2013-01-09 23:55:00 39

並由默認的xlsxwriter引擎正確寫入。

如何在熊貓中的每個組中按日期時間升序應用排序

問題描述

1 個解決方案

解決方案1
0 2019-06-27 12:33:20

重現步驟

可能的修復：

如何在熊貓中的每個組中按日期時間升序應用排序

問題描述

1 個解決方案

解決方案1 0 2019-06-27 12:33:20

重現步驟

可能的修復：

解決方案1
0 2019-06-27 12:33:20