[英]Pandas: concat multiple new columns to an existing data-frame based on the value of one of the columns
[英]Pandas data-frame based on months between date columns and average of value
我正在使用pandas數據幀,使用df.groupby()能夠在此結束,包括['start_date']和['end_date']以及特定id的值。
| id | start_date | end_date |value|
|:-----------|------------======|:---------------|-----|
| 1 | 02-01-2018| 05-31-2018| 40|
| 2 | 01-01-2018| 03-31-2018| 12.3|
有沒有
這是我試圖結束的數據幀:(值是start_date和end_date之間的值/ nummonths)
|id | month_belongs | value|
|------------|------------------|------|
| 1 | 02-01-2018| 10|
| 1 | 03-01-2018| 10|
| 1 | 04-01-2018| 10|
| 1 | 05-01-2018| 10|
| 2 | 01-01-2018| 4.1|
| 2 | 02-01-2018| 4.1|
| 2 | 03-01-2018| 4.1|
更像是一個unnesting
問題,隱藏密鑰是由date_range
創建的
#df.start_date=pd.to_datetime(df.start_date,dayfirst=False)
#df.end_date=pd.to_datetime(df.end_date,dayfirst=False)
df['month_belongs']=[pd.date_range(x,y,freq='MS')for x , y in zip(df.start_date,df.end_date)]
df=unnesting(df,['month_belongs'])
df['value']/=df['value'].groupby(level=0).transform('size').values
df
Out[301]:
month_belongs id start_date end_date value
0 2018-02-01 1 2018-02-01 2018-05-31 10.0
0 2018-03-01 1 2018-02-01 2018-05-31 10.0
0 2018-04-01 1 2018-02-01 2018-05-31 10.0
0 2018-05-01 1 2018-02-01 2018-05-31 10.0
1 2018-01-01 2 2018-01-01 2018-03-31 4.1
1 2018-02-01 2 2018-01-01 2018-03-31 4.1
1 2018-03-01 2 2018-01-01 2018-03-31 4.1
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
前提:我是熊貓的新手,主要是編碼。 我更多地發布我的解決方案,以獲得更好的方法來做任何其他事情的跡象。 對我來說,能夠達到這一點已經很好了,我覺得代碼至少足夠干凈以顯示它(希望它沒問題)。 我可能不得不花一些時間圍繞接受的答案。
import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta
start=[["02-01-2018", "05-31-2018", 40],
["01-01-2018", "03-31-2018", 12.3]]
df=pd.DataFrame(start,columns = ['std','end','v'])
df['std']=pd.to_datetime(df['std'])
df['end']=pd.to_datetime(df['end'])
df2=pd.DataFrame(columns = ['id', 'month_belongs', 'value'])
ix=0 # I'm sure there must be a better way here, than needing an index
for index, row in df.iterrows():
e,s =row['end'], row['std']
difference = relativedelta(e, s)
months = difference.months+1
while s <= e:
df2.loc[ix]=[index,s,row['v']/months]
s+= relativedelta(months=1)
ix+=1
print(df2)
輸出:
id month_belongs value
0 0 2018-02-01 10.0
1 0 2018-03-01 10.0
2 0 2018-04-01 10.0
3 0 2018-05-01 10.0
4 1 2018-01-01 4.1
5 1 2018-02-01 4.1
6 1 2018-03-01 4.1
import pandas as pd
df["value"] = df.apply(
lambda x: x["value"]/(
(pd.to_datetime(x["end_date"]) + pd.Timedelta(days=1)).month -
pd.to_datetime(x["start_date"]).month),
axis=1
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.