簡體   English   中英

將相同的 function 應用於 pandas dataframe 中的行子集

[英]Apply the same function to a subset of rows in pandas dataframe

我正在使用以下代碼使用類型和值列過濾 dataframe,然后刪除 tdiff > 0 和 < 3 的任何條目。

import pandas as pd  
    
d = {'Timestamp': ['2020-09-02 07:00:00','2020-09-02 07:10:00', '2020-09-02 07:30:00', '2020-09-02 08:00:00', '2020-09-02 10:00:00', '2020-09-02 11:10:00', '2020-09-02 11:30:00'], 
     'type': ['A','A','B','A', 'A','A','B'], 'value': [1,2,3,1,1,2,3]}
    
df = pd.DataFrame(data=d)
df3 = pd.DataFrame()
    
unique_type = pd.unique(df['type']).astype(str)  

for i in range(0,len(unique_type)):
    
    df1 = df[df.type == unique_type[i]]
    unique_val = pd.unique(df1['value']).astype(int)  
    
    for j in range(0, len(unique_val)):
        
        df2 = df1[df1.value == unique_val[j]]
        trange = pd.to_datetime(df2.Timestamp)
        tdiff = (trange-min(trange)).dt.total_seconds()/3600
        df2['tdiff'] = tdiff#.round(1)
                   
        df3 = df3.append(df2, ignore_index=True)

df4 = df3[~((df3.tdiff>0) & (df3.tdiff<3))] 

print(df)
df4.sort_values(by=['Timestamp'])

雖然這可行,但我想擺脫 for 循環並使用更高效的代碼。

您可以使用groupbytransform應用min

df['Timestamp'] = pd.to_datetime(df['Timestamp'] )
df['tmin'] = df.groupby(['type','value'])['Timestamp'].transform(min)
df['tdiff'] = (df['Timestamp'] - df['tmin']).dt.total_seconds()/3600
df[~((df.tdiff>0) & (df.tdiff<3))] 

output

    Timestamp            type      value  tmin                   tdiff
--  -------------------  ------  -------  -------------------  -------
 0  2020-09-02 07:00:00  A             1  2020-09-02 07:00:00        0
 1  2020-09-02 07:10:00  A             2  2020-09-02 07:10:00        0
 2  2020-09-02 07:30:00  B             3  2020-09-02 07:30:00        0
 4  2020-09-02 10:00:00  A             1  2020-09-02 07:00:00        3
 5  2020-09-02 11:10:00  A             2  2020-09-02 07:10:00        4
 6  2020-09-02 11:30:00  B             3  2020-09-02 07:30:00        4

試試這個簡單的代碼,用lambda function

數據:

d = {'type': ['A','A','B','C'], 'col1': [1,2,3,9]}
df = pd.DataFrame(data=d)

東風:

  type  col1
0    A     1
1    A     2
2    B     3
3    C     9
f = pd.Series(pd.Series(df['type'].unique()).apply(lambda type_: (type_, df[df['type'] == type_].col1.sum()))).to_list()
df['New-column'] = df.type.replace(dict(f))
df.loc[df['New-column'] >= 9, 'New-column'] = np.nan

東風:

  type  col1  New-column
0    A     1         3.0
1    A     2         3.0
2    B     3         3.0
3    C     9         NaN

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM