[英]Pandas Crosstab: Change Order of Columns That Are Named as Formatted Dates (mmm yy)
I have been looking for how to order columns for pandas crosstabs to no avail. 我一直在寻找如何订购pandas交叉表的列无济于事。 I specifically need to order my columns which are formatted dates (mmm yy) based on the values of the dates and not sorted alphabetically on the the 3-letter name of month (mmm). 我特别需要根据日期的值来订购格式化日期(mmm yy)的列,而不是按字母顺序在3个字母的月份名称(mmm)上排序。
Here are the details of my code: 以下是我的代码的详细信息:
python 3.3 python 3.3
pandas 0.12.0 大熊猫0.12.0
f_dtflt
is a pandas dataframe. f_dtflt
是一个pandas数据帧。
f_dtflt.COLLECTION_DATE
is dtype datetime64[ns] f_dtflt.COLLECTION_DATE
是f_dtflt.COLLECTION_DATE
datetime64 [ns]
My crosstab statement is: 我的交叉表声明是:
pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, f_dtflt.COLLECTION_DATE.apply(lambda x: x.strftime("%b %y")), margins=True)
The output is: 输出是:
COLLECTION_DATE Apr 13 Aug 13 Dec 12 Feb 13 Jan 13 Jul 13 Jun 13
EW_REGIONCOLLSITE
EAST 1964 2092 2280 2272 2757 2113 1902
WEST 2579 2011 1003 2351 2216 1506 1823
All 4543 4103 3283 4623 4973 3619 3725
COLLECTION_DATE Mar 13 May 13 Nov 12 Oct 12 Sep 13 All
EW_REGIONCOLLSITE
EAST 1682 1981 2108 825 975 22951
WEST 2770 3014 407 42 888 20610
All 4452 4995 2515 867 1863 43561
I want the columns to be ordered by ascending date...Oct 12, Nov 12, ... Jan 13, ...Sep 13. I recognize that I could format the dates so that they are yy-mm (eg 13-01) but these labels will be used in a report and that is a compromise I hope not to make. 我希望按照升序日期排序列... 10月12日,11月12日,... 1月13日,... 9月13日。我认识到我可以格式化日期,使它们是yy-mm(例如13- 01)但这些标签将用于报告中,这是我希望不做出的妥协。
I'm new to python and pandas so please help the newbie by connecting any dots in your responses! 我是python和pandas的新手,所以请通过连接你的回复中的任何点来帮助新手! Thanks a bunch. 谢谢一堆。
METHOD 1 方法1
Edit in response to the first part of @Andy's answer. 编辑以回应@Andy回答的第一部分。 There is an issue with step 3: 第3步出现问题:
I have tried to implement Andy's suggestion and here is more info on this effort. 我试图实现Andy的建议,这里有更多关于这项工作的信息。
1) I ran the following line to see what the dates look like. 1)我运行以下行来查看日期的样子。 The following line creates values such as '2012-10' for collection date. 以下行为收集日期创建诸如“2012-10”之类的值。 ("beautified" by print?) (打印“美化”?)
print(pd.DatetimeIndex(f_dtflt['COLLECTION_DATE']).to_period('M'))
2) When the above statement is entered into the crosstab, it changes the month values to digits such as 513, 514, etc. (actual values in field?) 2)当上述语句输入交叉表时,它会将月份值更改为513,514等数字(字段中的实际值?)
table1=pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, pd.DatetimeIndex(f_dtflt['COLLECTION_DATE']).to_period('M'), margins=True)
Here is the output: 这是输出:
col_0 513 514 515 516 517 518 519 520 521 522
EW_REGIONCOLLSITE
EAST 825 2108 2280 2757 2272 1682 1964 1981 1902 2113
WEST 42 407 1003 2216 2351 2770 2579 3014 1823 1506
All 867 2515 3283 4973 4623 4452 4543 4995 3725 3619
col_0 523 524 All
EW_REGIONCOLLSITE
EAST 2092 975 22951
WEST 2011 888 20610
All 4103 1863 43561
3) When I run the following code, it throws an error that 'int' object has no attribute 'strftime' 3)当我运行以下代码时,它会抛出一个'int'对象没有属性'strftime'的错误
table1.columns = table1.columns.map(lambda x: x.strftime("%b %y"))
I played around with this quite a bit and here are some of my notes: 我玩了很多,这是我的一些笔记:
# This runs and creates an array of strings: '513' etc.
pd.to_datetime(table1.columns.map(str), unit='M')
# The last entry in table1.columns is "All" and needs to be removed. Hence [:-1] slice.
# This also runs but seems to give years in 1630's.
pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M')
# This does not run because it says object is immutable
table1.columns[:-1]=pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M')
# This also runs but the output is weird. It seems to give an array of both dates and -1
table1.columns.reindex(pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M'))
# Does not run: DatetimeIndex() must be called with a collection of some kind, '513' was passed
table1.columns = table1.columns.map(lambda x: pd.DatetimeIndex(str(x)).strftime("%b %y"))
# Does not run: DatetimeIndex object is not callable
table1.rename(columns=pd.DatetimeIndex(table1.columns[:-1].map(str)).to_datetime('M'))
4) This does work for labeling the columns in the crosstab: 4)这适用于标记交叉表中的列:
table1.columns.name = 'COLLECTION_DATE'
METHOD 2 方法2
@Andy gave a second suggestion and I played around with it and couldn't get it to work. @Andy提出了第二个建议,我玩弄了它,无法让它发挥作用。 A big part of the issue is my lack of familiarity with python, pandas, and numpy. 问题的一个重要部分是我对python,pandas和numpy缺乏熟悉。 I made notes for myself as I tried to sort it out. 当我试图解决它时,我为自己做了笔记。 Here are my notes: 这是我的笔记:
# Working with a new concept
# This creates row titles of 12 10, 12 11, etc.
table1=pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, f_dtflt.COLLECTION_DATE.apply(lambda x: x.strftime("%y %m")), margins=True)
# This throws an error that yb is not defined
table1.columns.map(lambda yb: "%s %s" % (y, b) for y, b in yb.split())
# Tried to simplify and see what happens. Runs and creates an array of lists such as [['12, '10'], ['12', '11']...]
table1.columns.map(lambda x: x.split())
# Trying a different approach. This creates a numpy array of datetimes.
tempholder=table1.columns[:-1].map(lambda x: datetime.datetime(year=int(x[0:2]), month=int(x[3:]), day=1))
# Noted that f_dtflt['COLLECTION_DATE'] was a dtype of datetime64[ns] but tempholder was dtype object. So had issue.
# Convert to datetime64
# Get error: Out of bounds nanosecond timestamp: 12-10-01 00:00:00
tempholder=pd.to_datetime(tempholder)
# Tempholder is an array of datetimes from the datetime module. I used the pandas date function above.
# Need to change that and use python datetime module function.
# Does not work: 'numpy.ndarray' object has no attribute 'apply'...
# this is a pandas function which does not work on a numpy array.
tempholder.apply(lambda x: x.strftime('%b %y'))
# This works for numpy array but I can't tell what it contains.
# print(tempholder) gives <map object at 0x0000000026C04F28>
# tempholder gives Out[169]: <builtins.map at 0x26c04f28>
tempholder=map(lambda x: x.strftime('%b %y'), tempholder)
I approached this problem from a slightly different angle and created a function that can be used as a general method of ordering columns in a crosstab in pandas. 我从一个稍微不同的角度解决了这个问题,并创建了一个函数,可以用作在pandas交叉表中对列进行排序的一般方法。 It may also work for a pivot table but I didn't test that nor did I look at the details. 它也适用于数据透视表,但我没有测试,也没看过细节。 I suppose it can also be used to order row labels too but I didn't try for that. 我想它也可以用来订购行标签,但我没有尝试。
This creates a crosstab with column labels such as "12 10_Oct 12" and 12 11_Nov 12". The label effectively forces the alphabetizing of crosstab to work in my favor. The alphabetizing section of the label is concatenated with "_" and the label that I want to use. 这会创建一个带有列标签的交叉表,例如“12 10_Oct 12”和12 11_Nov 12“。标签有效地强制交叉表的字母顺序对我有利。标签的字母顺序部分与”_“连接,标签表示我想用。
table_1=pd.crosstab(f_dtflt.EW_REGIONCOLLSITE, f_dtflt.COLLECTION_DATE.apply(lambda x: x.strftime("%y %m_%b %y")), margins=True)
Output: 输出:
"COLLECTION_DATE 12 10_Oct 12 12 11_Nov 12 12 12_Dec 12 13 01_Jan 13
EW_REGIONCOLLSITE
EAST 825 2108 2280 2757
WEST 42 407 1003 2216
All 867 2515 3283 4973
COLLECTION_DATE 13 02_Feb 13 13 03_Mar 13 13 04_Apr 13 13 05_May 13
EW_REGIONCOLLSITE
EAST 2272 1682 1964 1981
WEST 2351 2770 2579 3014
All 4623 4452 4543 4995
COLLECTION_DATE 13 06_Jun 13 13 07_Jul 13 13 08_Aug 13 13 09_Sep 13
EW_REGIONCOLLSITE
EAST 1902 2113 2092 975
WEST 1823 1506 2011 888
All 3725 3619 4103 1863
COLLECTION_DATE All
EW_REGIONCOLLSITE
EAST 22951
WEST 20610
All 43561 "
The function and calls: 功能和调用:
def clean_label(label_list, margins='False'):
''' This function takes the column index list from a crosstab (or pivot table?) in pandas and removes the
part of the label before and including the "_". This allows the user to order the columns manually by creating
an alphabetical index followed by "_" and then the label that they would like to use. For example, a label such as
['a_Positive', 'b_Negative'] will be converted to ['Positive', 'Negative']. Another example would be to order dates
in a table from ['12 10_Oct 12', '12 11_Nov 12'] to ['Oct 12', 'Nov 12']
margins = False if the crosstab was created without margins and therefore does not have an "All" at the end of the list
margins = True if the crosstab was created with margins and therefore has an "All" at the end of the list
'''
corrected_list=list()
# If one creates margins in pivot/crosstab, will get the last column of "All"
# This has to be removed from the following code or it will throw an error.
if margins:
convert_list = label_list[:-1]
else:
convert_list = label_list
for l in convert_list:
x,y=l.split('_')
corrected_list.append(y)
if margins:
corrected_list.append('Total') # Renames "All" to "Total"
return corrected_list
# Change the labels on the crosstab table
table_1.columns=clean_label(table_1.columns, margins=True)
# Change name of columns
table_1.columns.name = 'Month of Collection'
# Change name of rows
table_1.index.name = 'Region'
Output (final table): 输出(决赛桌):
"Month of Collection Oct 12 Nov 12 Dec 12 Jan 13 Feb 13 Mar 13 Apr 13
Region
EAST 825 2108 2280 2757 2272 1682 1964
WEST 42 407 1003 2216 2351 2770 2579
All 867 2515 3283 4973 4623 4452 4543
Month of Collection May 13 Jun 13 Jul 13 Aug 13 Sep 13 Total
Region
EAST 1981 1902 2113 2092 975 22951
WEST 3014 1823 1506 2011 888 20610
All 4995 3725 3619 4103 1863 43561 "
If you've done as year-month as a string (and it's in the correct order), you could reverse: 如果你已经完成了作为字符串的年月(并且它的顺序正确),你可以逆转:
In [1]: df = pd.DataFrame([['a', 'b']], columns=['12 Mar', '12 Jun'])
In [2]: df.columns.map(lambda yb: ' '.join(reversed(yb.split())))
Out[2]: array(['Mar 12', 'Jun 12'], dtype=object)
In [3]: df.columns = df.columns.map(lambda yb: ' '.join(reversed(yb.split())))
I had suggested you could do this with periods: 我曾建议你可以用句号做到这一点:
pd.DatetimeIndex(f_dtflt['COLLECTION_DATE']).to_period('M')
Then after you can clean the column to the format you require: 然后,您可以将列清理为您需要的格式:
df.columns = df.columns.map(lambda x: x.strftime("%b %y"))
df.columns.name = 'COLLECTION_DATE'
but this appears to change period index into int (possibly a bug?). 但这似乎将期间索引更改为int(可能是一个错误?)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.