简体   繁体   English

如何按日期对pandas数据帧进行排序

[英]How to sort a pandas dataframe by date

I am importing data into a pandas dataframe from Google BigQuery and I'd like to sort the results by date. 我正在将数据从Google BigQuery导入到pandas数据框中,我想按日期对结果进行排序。 My code is as follows: 我的代码如下:

import sys, getopt
import pandas as pd
from datetime import datetime

# set your BigQuery service account private private key
pkey ='#REMOVED#'
destination_table = 'test.test_table_2'
project_id = '#REMOVED#'

# write your query
query = """
SELECT date, SUM(totals.visits) AS Visits
FROM `#REMOVED#.#REMOVED#.ga_sessions_20*`
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY Date
    """

data = pd.read_gbq(query, project_id, dialect='standard', private_key=pkey, parse_dates=True, index_col='date')
date = data.sort_index()

data.info()
data.describe()

print(data.head())

My output is shown below, as you can see dates are not sorted. 我的输出如下所示,因为您可以看到日期未排序。

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
date      3 non-null object
Visits    3 non-null int32
dtypes: int32(1), object(1)
memory usage: 116.0+ bytes
       date  Visits
0  20180312  207440
1  20180310  178155
2  20180311  207452

I have read several questions and so far tried the below, which resulted in no change to my output: 我已经阅读了几个问题,到目前为止尝试了以下内容,导致我的输出没有变化:

  • Removing index_col='date' and adding date = data.sort_values(by='date') 删除index_col='date'并添加date = data.sort_values(by='date')
  • Setting the date column as the index, then sorting the index (shown above). 将日期列设置为索引,然后对索引进行排序(如上所示)。
  • Setting headers ( headers = ['Date', 'Visits'] ) and dypes ( dtypes = [datetime, int] ) to my read_gbq line ( parse_dates=True, names=headers ) 将标题( headers = ['Date', 'Visits'] )和dypes( dtypes = [datetime, int] )设置为我的read_gbq行( parse_dates=True, names=headers

What am I missing? 我错过了什么?

As most of the work is done on the Google BigQuery side, I'd do sorting there as well: 由于大部分工作都是在Google BigQuery方面完成的,我也会在那里进行排序:

query = """
SELECT date, SUM(totals.visits) AS Visits
FROM `#REMOVED#.#REMOVED#.ga_sessions_20*`
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY Date
ORDER BY Date
"""

这应该工作:

data.sort_values('date', inplace=True)

I managed to solve this by transforming my date field into a datetime object, I assumed this would be done automatically by parse_date=True but it seems that will only parse a existing datetime object. 我设法通过将我的日期字段转换为datetime对象来解决这个问题,我假设这将由parse_date=True自动完成,但似乎只会解析现有的 datetime对象。

I added the following after my query to create a new datetime column from my date string, then I was able to use data.sort_index() and it worked as expected: 我在查询后添加了以下内容,从我的日期字符串创建一个新的datetime列,然后我能够使用data.sort_index()并且它按预期工作:

time_format = '%Y-%m-%d'
data = pd.read_gbq(query, project_id, dialect='standard', private_key=pkey)

data['n_date'] = pd.to_datetime(data['date'], format=time_format)  

data.index = data['n_date']

del data['date']
del data['n_date']

data.index.names = ['Date']

data = data.sort_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM