简体   繁体   English

沿第三维在 Python 中取百分位数

[英]Taking percentile in Python along 3rd dimension

I've been struggling with this one for a bit now.我一直在努力解决这个问题。 I have a matrix that is 55115 x 34, where each number along the first dimension is one day, for 151 years, totally 55115 points.我有一个 55115 x 34 的矩阵,其中第一个维度上的每个数字是一天,151 年,总共 55115 个点。

I am trying to get monthly percentiles of the values in the first dimension, so I have first added a date column, which subsequently groups it into months, although I cannot figure out the best way to take the percentile (95th) of both the days and the third dimension (here is 34).我正在尝试获取第一维中值的每月百分位数,因此我首先添加了一个日期列,随后将其分组为几个月,尽管我无法找出获取这两天的百分位数(第 95 位)的最佳方法和第三维(这里是 34)。 So after grouping the months, the matrix should be 151 x 12 x 34, and I want to take the 95th percentile along the third dimension, so my final matrix would be 151 x 12, in theory.所以在对月份进行分组之后,矩阵应该是 151 x 12 x 34,我想沿着第三维取第 95 个百分位数,所以理论上我的最终矩阵应该是 151 x 12。 Below is what I have so far to add the dates to the array:以下是我到目前为止将日期添加到数组中的内容:

dates = pd.date_range(start='1950-01-01', end='2100-12-31', freq='D') #create daily date range from 1950 to 2100

leap = [] #empty array
for each in dates:
    if each.month==2 and each.day ==29: #find each leap day (feb 29)
        leap.append(each)

dates = dates.drop(leap) #get rid of leap days
dates = pd.to_datetime(dates) #convert to datetime format 
data = {'wind': winddata, 'time': dates} #create table with both dates and data
df = pd.DataFrame(data) #create dataframe
df.set_index('time') #index time
df.groupby(df['time'].dt.strftime('%b'))['wind'].sort_values()

And this is what I have to take the percentile:这就是我必须采用的百分位数:

months = df.groupby(pd.Grouper(key='time',freq = "M")) #group each month
monthly_percentile = months.aggregate(lambda x: np.percentile(x, q = 95)) #percentile across each month 

Although, this does not appear to work.虽然,这似乎不起作用。 I'm open to other methods of doing this, I just am hoping to a) rearrange the 55115 x 34 data set into months, so that it is 151 (years) x 365 (days) x 34 (ensembles), and then the percentile is taken across the months and third dimension so I end up with 151 x 12 total.我对执行此操作的其他方法持开放态度,我只是希望 a) 将 55115 x 34 数据集重新排列为月,使其为 151(年)x 365(天)x 34(集合),然后百分位数是跨越月份和三维的,所以我最终得到 151 x 12 的总数。 I'm happy to clarify anything if I did not specify well enough.如果我没有详细说明,我很乐意澄清任何事情。 Any detailed response would be really helpful.任何详细的回复都会非常有帮助。 Thank you so much in advance!非常感谢您!

If I get your question right, the most straightforward solution I can think of is to add the columns year and month , then groupby over them and compute a required percentile:如果我的问题正确,我能想到的最直接的解决方案是添加列yearmonth ,然后对它们进行 groupby 并计算所需的百分位数:

import pandas as pd
import numpy as np

dates = pd.date_range(start='1950-01-01', end='2100-12-31', freq='D')
dates_months = [date.month for date in dates]
dates_years = [date.year for date in dates]
values = np.random.rand(34, len(dates))
df = pd.DataFrame()

df['date'] = dates
df['year'] = dates_years
df['month'] = dates_months
for i in range(34):
    df[f'values_{i}'] = values[i]

df = df.melt(id_vars=['date', 'year', 'month'], value_vars=[f'values_{i}' for i in range(34)])
sub = df.groupby(['year', 'month']).value.apply(lambda x: np.quantile(x, .95)).reset_index()

finally, if you really need a 151 x 12 array instead of year-month-percentile table of length 1812 (=151*12) you could use something like this:最后,如果你真的需要一个151 x 12的数组而不是长度为 1812 (=151*12) 的年月百分比表,你可以使用这样的东西:

crosstab = pd.crosstab(index=sub['year'], columns=sub['month'], values=sub['values'], aggfunc=lambda x: x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM