修改 pandas dataframe 中的日期列計算

Question

我有一個看起來像這樣的 dataframe

我需要調整 34 號條目的time_in_weeks列。 當存在具有不同rma_created_date的重復uniqueid時，這意味着發生了一些故障。 需要更改 34 以計算新的最近rma_created_date （在本例中為 2020-10-15）之間的周數並減去上述行rma_processed_date的 rma_processed_date。

我希望這對我正在嘗試做的事情有意義。

到目前為止，我這樣做了

def clean_df(df):
    '''
    This function will fix the time_in_weeks column to calculate the correct number of weeks
    when there is multiple failured for an item.
    '''
    
    # Sort by rma_created_date
    df = df.sort_values(by=['rma_created_date'])

現在我需要執行上面描述的操作，但我對如何執行此操作有點困惑。 特別是考慮到我們可能有多個失敗，而不僅僅是 2 個。

我應該得到這樣的東西返回為 output

如您所見， 34 發生了什么，它被更改為2020-10-15和2020-06-26之間的周數

這是另一個包含更多行的示例

使用建議的表達式

df['time_in_weeks']=np.where(df.uniqueid.duplicated(keep='first'),df.rma_processed_date.dt.isocalendar().week.sub(df.rma_processed_date.dt.isocalendar().week.shift(1)),df.time_in_weeks)

我明白了

最后說明：如果日期是 1900 年 1 月 1 日，則不要執行任何計算。

Answer 1

問題不是很清楚。 如果我解釋錯誤，很高興糾正。

嘗試使用np.where(condition, choiceif condition, choice ifnotcondition)

#Coerce dates into datetime
df['rma_processed_date']=pd.to_datetime(df['rma_processed_date'])
df['rma_created_date']=pd.to_datetime(df['rma_created_date'])

#Solution    

df['time_in_weeks']=np.where(df.uniqueid.duplicated(keep='first'),df.rma_created_date.sub(df.rma_processed_date),df.time_in_weeks)

修改 pandas dataframe 中的日期列計算

問題描述

1 個解決方案

解決方案1
1 已采納 2020-12-04 00:04:37

修改 pandas dataframe 中的日期列計算

問題描述

1 個解決方案

解決方案1 1 已采納 2020-12-04 00:04:37

解決方案1
1 已采納 2020-12-04 00:04:37