[英]Combine date present in two different columns to generate mean for a column
I have a dataset of the following format which has the Starting column values ranging from 2021-01-01 to 2022-03-13 and same goes for the Ending column where my values begin from 2021-01-01 to 2022-03-13.我有以下格式的数据集,其起始列值范围从 2021-01-01 到 2022-03-13,结束列也是如此,我的值从 2021-01-01 到 2022-03-13 .
The data for rainfall gets collected on a daily basis such that the entries are as follows:每天收集降雨数据,条目如下:
I am trying to combine and form monthly average values for the dataset.我正在尝试合并并形成数据集的每月平均值。 I cannot find a way where I am able to take monthly average values and store them in a different pandas dataframe such that it appears as follows:
我找不到一种方法可以获取月平均值并将它们存储在不同的 pandas dataframe 中,如下所示:
The Monthly Rainfall is found using Total rainfall/ Total days in the month每月降雨量是使用当月的总降雨量/总天数得出的
Any help would be appreciated!任何帮助,将不胜感激!
I have tried to use groupy and mean together from pandas library to find the output but it doesn't appear in the format I want.我尝试使用 groupy 和 mean 一起从 pandas 库中找到 output 但它没有以我想要的格式出现。
df=df.groupby(['Starting','Ending','Location_id'])['rainfall'].mean().reset_index() df=df.groupby(['Starting','Ending','Location_id'])['rainfall'].mean().reset_index()
To solve the problem, you can write a function like this:为了解决这个问题,你可以这样写一个function:
import math
from datetime import datetime
def to_date(x, y):
lists = zip([datetime.strptime(dt, '%Y-%m-%d').date() for dt in x], [datetime.strptime(dt, '%Y-%m-%d').date() for dt in y])
return [0 if math.isinf((x-y).days) else (x-y).days for x,y in lists]
Basically this function takes two lists (x,y) and turn every item in those into date()
objects.基本上这个 function 需要两个列表 (x,y) 并将其中的每个项目转换为
date()
对象。 And returns a new lists with items as days
object. For your information, if you deduct same dates, Python returns an inf
integer, which is infinite.并返回一个新列表,其中的项目为
days
object。供您参考,如果扣除相同的日期,Python 将返回一个inf
integer,这是无限的。 To go over this, you can check if the item is an infitine integer, if so return 0 else return days
.至 go 超过此,您可以检查该项目是否为无限 integer,如果是则返回 0 否则返回
days
。
Here's the code snippet I wrote, since you didn't provide a dataset, I wrote using the images you provided:这是我写的代码片段,因为你没有提供数据集,我用你提供的图片写的:
import pandas as pd
d = {
'New_Starting': ['2021-01-01','2021-01-01','2021-01-01'],
'New_Ending': ['2021-01-31','2021-01-31','2021-01-31'],
'Location_id': [45, 52, 30],
'Rainfall': [4.07, 6.53, 3.71]
}
d = pd.DataFrame(d)
d['Monthly_Rainfall'] = d['Rainfall'] / to_date(d['New_Ending'], d['New_Starting'])
Output: Output:
New_Starting New_Ending Location_id Rainfall Monthly_Rainfall
0 2021-01-01 2021-01-31 45 4.07 0.135667
1 2021-01-01 2021-01-31 52 6.53 0.217667
2 2021-01-01 2021-01-31 30 3.71 0.123667
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.