[英]How to apply sort by ascending order for datetime in every group in pandas
I have a data set to be grouped according to 'user_id'
and 'contest_id'
and among that, I have to sort every user in each contest who have entered the contest on basis of date and time in ascending order. 我有一个数据集要根据
'user_id'
和'contest_id'
进行分组,其中,我必须按日期和时间'contest_id'
到大对每个比赛中进入比赛的每个用户进行排序。
I have tried first grouping the data according to contest_id
and user handle
then I tried sorting the dates in ascending order using sort_values after converting the datetime
column into `to_datetime' 我尝试过先根据
contest_id
和user handle
对数据进行分组,然后在将datetime
列转换为“ to_datetime”后,尝试使用sort_values对日期进行升序排序
When i am trying to save the code it gives an error ''' 当我尝试保存代码时出现错误'''
Excel doesn't support timezones in datetimes. Set the tzinfo in the
datetime/time object to None or use the 'remove_timezone' Workbook()
option
''' '''
dftotal.groupby(["contestID", "userHandle"])
dftotal["registerDateTime"]=pd.to_datetime(dftotal.registerDateTime)
dftotal["RegistrationDateTime"] = dftotal["registerDateTime"]
dftotal["submitDateTime"] = pd.to_datetime(dftotal.submitDateTime)
dftotal["SubmissionDateTime"] = dftotal["submitDateTime"]
dftotal = dftotal.sort_values(by=['RegistrationDateTime'])
data is 数据是
contest_id user_id registration submission score
1234 abc 2012-01-09 2012-01-09 90
21:51:00+00:00 22:51:00+00:00
4489 pabc 2013-01-09 2013-01-09 39
21:51:00+00:00 22:55:00+00:00
1234 tiop 2012-01-09 2012-01-09 100
23:51:00+00:00 23:55:00+00:00
4489 pabceu 2013-01-09 2013-01-09 39
23:20:00+00:00 23:55:00+00:00
expected is 预期是
contest_id user_id registration submission score
1234 abc 2012-01-09 2012-01-09 90
21:51:00+00:00 22:51:00+00:00
1234 tiop 2012-01-09 2012-01-09 100
23:51:00+00:00 23:55:00+00:00
4489 pabc 2013-01-09 2013-01-09 39
21:51:00+00:00 22:55:00+00:00
4489 pabceu 2013-01-09 2013-01-09 39
23:20:00+00:00 23:55:00+00:00
I finally could reproduce and fix. 我终于可以复制和修复了。
import pandas as pd
import io
t = '''contest_id user_id registration submission score
1234 abc 2012-01-09 21:51:00+00:00 2012-01-09 22:51:00+00:00 90
4489 pabc 2013-01-09 21:51:00+00:00 2013-01-09 22:55:00+00:00 39
1234 tiop 2012-01-09 23:51:00+00:00 2012-01-09 23:55:00+00:00 100
4489 pabceu 2013-01-09 23:20:00+00:00 2013-01-09 23:55:00+00:00 39'''
dftotal=pd.read_csv(io.StringIO(t), sep=r'\s\s+', engine='python')
print(dftotal.to_string())
dftotal['registration'] = pd.to_datetime(dftotal.registration, utc=True)
dftotal['submission'] = pd.to_datetime(dftotal.submission, utc=True)
print(dftotal.to_string())
dftotal.to_excel('contest_new.xlsx')
Which displays: 显示:
contest_id user_id registration submission score
0 1234 abc 2012-01-09 21:51:00+00:00 2012-01-09 22:51:00+00:00 90
1 4489 pabc 2013-01-09 21:51:00+00:00 2013-01-09 22:55:00+00:00 39
2 1234 tiop 2012-01-09 23:51:00+00:00 2012-01-09 23:55:00+00:00 100
3 4489 pabceu 2013-01-09 23:20:00+00:00 2013-01-09 23:55:00+00:00 39
contest_id user_id registration submission score
0 1234 abc 2012-01-09 21:51:00+00:00 2012-01-09 22:51:00+00:00 90
2 1234 tiop 2012-01-09 23:51:00+00:00 2012-01-09 23:55:00+00:00 100
1 4489 pabc 2013-01-09 21:51:00+00:00 2013-01-09 22:55:00+00:00 39
3 4489 pabceu 2013-01-09 23:20:00+00:00 2013-01-09 23:55:00+00:00 39
and raises: 并提出:
TypeError: Excel doesn't support timezones in datetimes.
TypeError:Excel在日期时间中不支持时区。 Set the tzinfo in the datetime/time object to None or use the 'remove_timezone' Workbook() option
将datetime / time对象中的tzinfo设置为None或使用'remove_timezone'Workbook()选项
Use openpyxl: 使用openpyxl:
This error is raised by the xlsxwriter backend. xlsxwriter后端引发此错误。 If openpyxl is installed it is enough to ask for that engine:
如果已安装openpyxl,则足以要求该引擎:
... dftotal.to_excel('contest_new.xlsx', engine='openpyxl')
It automatically removes the tz information and correctly writes to the excel file 它会自动删除tz信息并正确写入excel文件
Explicitely remove the ts information: 明确删除ts信息:
the timezone information can be explicitely removed with tz_localize(None)
: 可以使用
tz_localize(None)
明确删除时区信息:
... dftotal['registration'] = pd.to_datetime(dftotal.registration).dt.tz_localize(None) dftotal['submission'] = pd.to_datetime(dftotal.submission).dt.tz_localize(None) dftotal = dftotal.sort_values(by=['registration']) print(dftotal.to_string()) dftotal.to_excel('contest_new.xlsx')
The dataframe displays as: 数据框显示为:
contest_id user_id registration submission score 0 1234 abc 2012-01-09 21:51:00 2012-01-09 22:51:00 90 2 1234 tiop 2012-01-09 23:51:00 2012-01-09 23:55:00 100 1 4489 pabc 2013-01-09 21:51:00 2013-01-09 22:55:00 39 3 4489 pabceu 2013-01-09 23:20:00 2013-01-09 23:55:00 39
and is written without error by the default xlsxwriter engine. 并由默认的xlsxwriter引擎正确写入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.