沿第三维在 Python 中取百分位数

Question

I've been struggling with this one for a bit now.我一直在努力解决这个问题。 I have a matrix that is 55115 x 34, where each number along the first dimension is one day, for 151 years, totally 55115 points.我有一个 55115 x 34 的矩阵，其中第一个维度上的每个数字是一天，151 年，总共 55115 个点。

I am trying to get monthly percentiles of the values in the first dimension, so I have first added a date column, which subsequently groups it into months, although I cannot figure out the best way to take the percentile (95th) of both the days and the third dimension (here is 34).我正在尝试获取第一维中值的每月百分位数，因此我首先添加了一个日期列，随后将其分组为几个月，尽管我无法找出获取这两天的百分位数（第 95 位）的最佳方法和第三维（这里是 34）。 So after grouping the months, the matrix should be 151 x 12 x 34, and I want to take the 95th percentile along the third dimension, so my final matrix would be 151 x 12, in theory.所以在对月份进行分组之后，矩阵应该是 151 x 12 x 34，我想沿着第三维取第 95 个百分位数，所以理论上我的最终矩阵应该是 151 x 12。 Below is what I have so far to add the dates to the array:以下是我到目前为止将日期添加到数组中的内容：

dates = pd.date_range(start='1950-01-01', end='2100-12-31', freq='D') #create daily date range from 1950 to 2100

leap = [] #empty array
for each in dates:
    if each.month==2 and each.day ==29: #find each leap day (feb 29)
        leap.append(each)

dates = dates.drop(leap) #get rid of leap days
dates = pd.to_datetime(dates) #convert to datetime format 
data = {'wind': winddata, 'time': dates} #create table with both dates and data
df = pd.DataFrame(data) #create dataframe
df.set_index('time') #index time
df.groupby(df['time'].dt.strftime('%b'))['wind'].sort_values()

And this is what I have to take the percentile:这就是我必须采用的百分位数：

months = df.groupby(pd.Grouper(key='time',freq = "M")) #group each month
monthly_percentile = months.aggregate(lambda x: np.percentile(x, q = 95)) #percentile across each month

Although, this does not appear to work.虽然，这似乎不起作用。 I'm open to other methods of doing this, I just am hoping to a) rearrange the 55115 x 34 data set into months, so that it is 151 (years) x 365 (days) x 34 (ensembles), and then the percentile is taken across the months and third dimension so I end up with 151 x 12 total.我对执行此操作的其他方法持开放态度，我只是希望 a) 将 55115 x 34 数据集重新排列为月，使其为 151（年）x 365（天）x 34（集合），然后百分位数是跨越月份和三维的，所以我最终得到 151 x 12 的总数。 I'm happy to clarify anything if I did not specify well enough.如果我没有详细说明，我很乐意澄清任何事情。 Any detailed response would be really helpful.任何详细的回复都会非常有帮助。 Thank you so much in advance!非常感谢您！

Answer 1

If I get your question right, the most straightforward solution I can think of is to add the columns year and month , then groupby over them and compute a required percentile:如果我的问题正确，我能想到的最直接的解决方案是添加列year和month ，然后对它们进行 groupby 并计算所需的百分位数：

import pandas as pd
import numpy as np

dates = pd.date_range(start='1950-01-01', end='2100-12-31', freq='D')
dates_months = [date.month for date in dates]
dates_years = [date.year for date in dates]
values = np.random.rand(34, len(dates))
df = pd.DataFrame()

df['date'] = dates
df['year'] = dates_years
df['month'] = dates_months
for i in range(34):
    df[f'values_{i}'] = values[i]

df = df.melt(id_vars=['date', 'year', 'month'], value_vars=[f'values_{i}' for i in range(34)])
sub = df.groupby(['year', 'month']).value.apply(lambda x: np.quantile(x, .95)).reset_index()

finally, if you really need a 151 x 12 array instead of year-month-percentile table of length 1812 (=151*12) you could use something like this:最后，如果你真的需要一个151 x 12的数组而不是长度为 1812 (=151*12) 的年月百分比表，你可以使用这样的东西：

crosstab = pd.crosstab(index=sub['year'], columns=sub['month'], values=sub['values'], aggfunc=lambda x: x)

沿第三维在 Python 中取百分位数

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-11 08:58:07

沿第三维在 Python 中取百分位数

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-11 08:58:07

解决方案1
0 已采纳 2021-04-11 08:58:07