[英]Pandas normalize column indexed by datetimeindex by sum of groupby date
If given a dataframe that's indexed with a datetimeindex, is there an efficient way to normalize the values within a given day? 如果给定一个使用datetimeindex索引的数据框,是否有一种有效的方法可以在给定的日期内对值进行规范化? For example I'd like to sum all values for each day, and then divide each columns values by the resulting sum for the day.
例如,我想对每一天的所有值求和,然后将每一列的值除以当日的总和。
I can easily group by date and calculate the divisor (sum of values of each column for each date) but I'm not entirely sure the best way to divide the original dataframe by the resulting sum df. 我可以轻松地按日期分组并计算除数(每个日期的每一列的值的总和),但我不完全确定将原始数据帧除以结果总和df的最佳方法。
Example dataframe with datetimeindex and resulting df from sum 带有datetimeindex的示例数据帧,以及从求和所得的df
I attempted to do something like 我试图做类似的事情
df / df.groupby(df.index.to_period('D')).sum()
however it isn't behaving as I would have hoped for. 但是它的行为并不像我希望的那样。
Instead I'm getting a df with NaN everywhere and Date appended as new indexes. 相反,我到处都带有NaN的df,并将Date添加为新索引。
ie Results from above division 即来自上述部门的结果
Toy recreation: 玩具娱乐:
df = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]],columns=['a','b'],
index=pd.to_datetime(['2017-01-01 14:30:00','2017-01-01 14:31:00',
'2017-01-02 14:30:00', '2017-01-02 14:31:00']))
df / df.groupby(df.index.to_period('D')).sum()
results in 结果是
a b
2017-01-01 14:30:00 NaN NaN
2017-01-01 14:31:00 NaN NaN
2017-01-02 14:30:00 NaN NaN
2017-01-02 14:31:00 NaN NaN
2017-01-01 NaN NaN
2017-01-02 NaN NaN
You will need to copy and paste your dataframe as text and not an image so I can help further but here is an example: 您将需要将数据框复制并粘贴为文本而不是图像,因此我可以提供进一步的帮助,但这是一个示例:
sample df 样本df
df1 = pd.DataFrame(np.random.randn(5,5), columns=list('ABCDE'),
index=pd.date_range('2017-01-03', '2017-01-07'))
df2 = pd.DataFrame(np.random.randn(5,5), columns=list('ABCDE'),
index=pd.date_range('2017-01-03', '2017-01-07'))
df = pd.concat([df1,df2])
A B C D E
2017-01-03 1.393874 1.933301 0.215026 -0.412957 -0.293925
2017-01-04 0.825777 0.315449 2.317292 -0.347617 -2.427019
2017-01-05 -0.372916 -0.931185 0.049707 0.635828 -0.774566
2017-01-06 1.564714 -1.582461 1.455403 0.521305 -2.175344
2017-01-07 1.255747 1.967338 -0.766391 -0.021921 0.672704
2017-01-03 0.620301 -1.521681 -0.352800 -1.394239 -1.206983
2017-01-04 -0.041829 -0.870871 -0.402440 0.268725 1.499321
2017-01-05 -1.098647 1.690136 1.004087 0.304037 1.235310
2017-01-06 0.305645 -0.327096 0.280591 -0.476904 1.652096
2017-01-07 1.251927 0.469697 0.047694 1.838995 -0.258889
then what you are currently doing: 那么您当前正在做什么:
df / df.groupby(df.index).sum()
A B C D E
2017-01-03 0.692032 4.696817 -1.560723 0.228507 0.195831
2017-01-03 0.307968 -3.696817 2.560723 0.771493 0.804169
2017-01-04 1.053357 -0.567944 1.210167 4.406211 2.616174
2017-01-04 -0.053357 1.567944 -0.210167 -3.406211 -1.616174
2017-01-05 0.253415 -1.226937 0.047170 0.676510 -1.681122
2017-01-05 0.746585 2.226937 0.952830 0.323490 2.681122
2017-01-06 0.836585 0.828706 0.838369 11.740853 4.157386
2017-01-06 0.163415 0.171294 0.161631 -10.740853 -3.157386
2017-01-07 0.500762 0.807267 1.066362 -0.012064 1.625615
2017-01-07 0.499238 0.192733 -0.066362 1.012064 -0.625615
Take a look at the first row col A 看看第一行col A
1.393874 / (1.393874 + 0.620301) = 0.6920322216292031
so your example of df / df.groupby(df.index).sum()
is working as expected. 1.393874 / (1.393874 + 0.620301) = 0.6920322216292031
因此您的df / df.groupby(df.index).sum()
示例按预期工作。
Also be careful if your data contains NaNs because np.nan / a number = nan
如果您的数据包含NaN,
np.nan / a number = nan
小心,因为np.nan / a number = nan
df = pd.DataFrame([[1,2],[3,4],[5,6],[7,8]],columns=['a','b'],
index=pd.to_datetime(['2017-01-01 14:30:00','2017-01-01 14:31:00',
'2017-01-02 14:30:00', '2017-01-02 14:31:00']))
# create multiindex with level 1 being just dates
df.set_index(df.index.floor('D'), inplace=True, append=True)
# divide df by the group sum matching the index values of level 1
df.div(df.groupby(level=1).sum(), level=1).reset_index(level=1, drop=True)
a b
2017-01-01 14:30:00 0.250000 0.333333
2017-01-01 14:31:00 0.750000 0.666667
2017-01-02 14:30:00 0.416667 0.428571
2017-01-02 14:31:00 0.583333 0.571429
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.