簡體   English   中英

根據另一個數據框上的日期條件從一個數據框中刪除行

[英]removing rows from one dataframe based on condition on date condition on the other dataframe

我有以下數據框df1

id        date_col      No. of leaves
100       2018-10-05      4
100       2018-10-14      4
100       2018-10-19      4
100       2018-11-15      4
101       2018-10-05      3
101       2018-10-08      3
101       2018-12-05      3

df2

id        date_col       leaves_availed
100       2018-11-28       2
100       2018-11-29       2
101       2018-11-19       2
101       2018-11-24       2

我想讓特定ID和df1中的日期的行小於df2中針對特定ID的日期,然后刪除具有最早日期的行,並從“葉子數”中減去leaves_availed的數目。

在上面的示例中,結果數據幀應為

id        date_col      No. of leaves
100       2018-10-19      2
100       2018-11-15      2
101       2018-12-05      1

對於df2中id = 100和日期2018-11-28的日期小於2018-11-28的行為

id        date_col      No. of leaves
100       2018-10-05      4
100       2018-10-14      4
100       2018-10-19      4
100       2018-11-15      4

並且此子集中的最早日期是2018-10-05,因此,行100 2018-10-05 4將被刪除,依此類推

現在,我已經對兩個數據框進行了排序

df1.sort_values(by=['id','date_col'],inplace=True)
df2.sort_values(by=['id','date_col'],inplace=True)

和我試圖根據df2中的行數刪除df1中的頂部行,但這無處可尋

遵循您的邏輯,但無需測試所有例外

import pandas as pd

def process(row):
    return row['No. of leaves'] - df2.iloc[0]['leaves_availed']

#recreate the different dataframe"
id1 = pd.DataFrame({'id': [100, 100, 100, 100, 101, 101, 101]})
il1 = pd.DataFrame({'No. of leaves': [4, 4, 4, 4, 3, 3, 3]})
id2 = pd.DataFrame({'id': [100, 100, 101, 101]})
il2 = pd.DataFrame({'leaves_availed': [2, 2, 2, 2]})
df1 = pd.DataFrame({'year': [2018, 2018, 2018, 2018, 2018, 2018, 2018],
                   'month': [10,   10,   10,   11,   10,   10,   12],
                     'day': [5,    14,   19,   15,    5,   8,    5]})    
df2 = pd.DataFrame({'year': [2018, 2018, 2018, 2018],
                   'month': [11,   11,   11,   11],
                     'day': [28,   29,   19,   24]})   
df1 = pd.Series(pd.to_datetime(df1, format='%Y-%m-%d')).to_frame()
df1.columns = ["date_col"]
df1 = pd.concat([id1, df1, il1], axis=1)
df2 = pd.Series(pd.to_datetime(df2, format='%Y-%m-%d')).to_frame()
df2.columns = ["date_col"]
df2 = pd.concat([id2, df2, il2], axis=1)    
df1.sort_values(by=['id','date_col'],inplace=True)
df2.sort_values(by=['id','date_col'],inplace=True)
#end of creation dafaframes

#loop each row of df2
for i in range(0, len(df2)):
    #filtering the df
    df3 = df1[(df1["date_col"] < df2.iloc[i]["date_col"]) & (df1['id'] == df2.iloc[i]['id']) ] 
    df3 = df3.iloc[1:]  #delete the oldest
    df3['No. of leaves'] = df3.apply(lambda row: process(row), axis = 1) #calculus the new leaves
    print(F"result for date {df2.iloc[i]['date_col']} and id =  {df2.iloc[i]['id']}")
    print(df3);print('-----------------\n')

最終結果顯示

result for date 2018-11-28 00:00:00 and id =  100
    id   date_col  No. of leaves
1  100 2018-10-14              2
2  100 2018-10-19              2
3  100 2018-11-15              2
-----------------
result for date 2018-11-29 00:00:00 and id =  100
    id   date_col  No. of leaves
1  100 2018-10-14              2
2  100 2018-10-19              2
3  100 2018-11-15              2
-----------------
result for date 2018-11-19 00:00:00 and id =  101
    id   date_col  No. of leaves
5  101 2018-10-08              1
-----------------
result for date 2018-11-24 00:00:00 and id =  101
    id   date_col  No. of leaves
5  101 2018-10-08              1
-----------------

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM