簡體   English   中英

根據特定列中的值將函數應用於熊貓中的數據框行

[英]Apply function to dataframe row in pandas based on value in specific column

假設我有pandas數據框,其中第一列是閾值:

threshold,value1,value2,value3,...,valueN
5,12,3,4,...,20
4,1,7,8,...,3
7,5,2,8,...,10

對於每一行,我希望將value1..valueN列中的元素設置為零(如果小於threshold

threshold,value1,value2,value3,...,valueN
5,12,0,0,...,20
4,0,7,8,...,0
7,0,0,8,...,10

沒有顯式的for循環怎么辦?

您可以通過以下方式嘗試:

df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: np.where(x > df.threshold, x, 0), axis=0)

使用DataFrame.ltmask比較:

df = df.mask(df.lt(df['threshold'], axis=0), 0)

set_indexreset_index

df = df.set_index('threshold')
df = df.mask(df.lt(df.index, axis=0), 0).reset_index()

為了提高性能numpy solution

arr = df.values
df = pd.DataFrame(np.where(arr < arr[:, 0][:, None], 0, arr), columns=df.columns)

print (df)
   threshold  value1  value2  value3  valueN
0          5      12       0       0      20
1          4       0       7       8       0
2          7       0       0       8      10

時間

In [294]: %timeit set_reset_sol(df)
1 loop, best of 3: 376 ms per loop

In [295]: %timeit numpy_sol(df)
10 loops, best of 3: 59.9 ms per loop

In [296]: %timeit df.mask(df.lt(df['threshold'], axis=0), 0)
1 loop, best of 3: 380 ms per loop

In [297]: %timeit df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: np.where(x > df.threshold, x, 0), axis=0)
1 loop, best of 3: 449 ms per loop


np.random.seed(234)
N = 100000

#[100000 rows x 100 columns] 
df = pd.DataFrame(np.random.randint(100, size=(N, 100)))
df.columns = ['threshold'] + df.columns[1:].tolist()
print (df)

def set_reset_sol(df):
    df = df.set_index('threshold')
    return df.mask(df.lt(df.index, axis=0), 0).reset_index()

def numpy_sol(df):
    arr = df.values
    return pd.DataFrame(np.where(arr < arr[:, 0][:, None], 0, arr), columns=df.columns)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM