简体   繁体   English

根据应用于另一个 DataFrame 的条件创建一个新的 DataFrame

[英]Create a new DataFrame based on conditions applied to another DataFrame

I have a DataFrame as follows:我有一个 DataFrame 如下:

data = [[99330,12,122], [1123,1230,1287], [123,101,812739], [1143,12301230,252]] 
df1 = pd.DataFrame(data, index=['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04'], columns=['col_A', 'col_B', 'col_C']) 
df1 = df1/df1.shift(1)-1 
df1['mean'] = df1.mean(axis=1) 
df1['upper'] = df1['mean'] + df1.filter(regex='col').std(axis=1)

df1: df1:

            col_A      col_B          col_C        mean         upper
2022-01-01  NaN        NaN            NaN          NaN          NaN
2022-01-02  -0.988694  101.500000     9.549180     36.686829    93.063438
2022-01-03  -0.890472  -0.917886      630.498834   209.563492   574.104192
2022-01-04  8.292683   121793.356436  -0.999690    40600.216476 110915.538448

I want to create a second DataFrame only with the values of col_A, col_B, col_C that are greater than df['upper'] and fill all other values with nan .我想仅使用大于df['upper']col_A, col_B, col_C的值创建第二个 DataFrame 并用nan填充所有其他值。

So the DataFrame should look like this所以 DataFrame 应该是这样的

df2: df2:

            col_A      col_B          col_C        
2022-01-01  NaN        NaN            NaN          
2022-01-02  NaN        101.500000     NaN
2022-01-03  NaN        NaN            630.498834   
2022-01-04  NaN        121793.356436  NaN   

Is there a way to do this Pythonically without having to go through many loops?有没有办法以 Python 方式执行此操作,而无需通过许多循环 go ?

Filter the "col" columns and make a vectorized greater-than comparison using gt() on axis and mask the unwanted values via where() method.过滤“col”列并使用轴上的gt()进行矢量化大于比较,并通过where()方法屏蔽不需要的值。

# replace the values in "col" columns that are less than df1.upper
df2 = df1.filter(like='col').where(lambda x: x.ge(df1['upper'], axis=0))
df2

资源

Try this... not sure if this is the best way, as I did use one for loop...试试这个...不确定这是否是最好的方法,因为我确实使用了一个 for 循环...

import numpy as np 
data = [[99330,12,122], [1123,1230,1287], [123,101,812739], [1143,12301230,252]] 
df1 = pd.DataFrame(data, index=['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04'], columns=['col_A', 'col_B', 'col_C']) 
df1 = df1/df1.shift(1)-1 
df1['mean'] = df1.mean(axis=1) 
df1['upper'] = df1['mean'] + df1.filter(regex='col').std(axis=1)
df2 = df1.copy()
for col in cols:
    if "col" in col:
        df2[col] = df2.apply(lambda x: x[col] if x[col] > x["upper"] else np.nan,axis=1)
        
df2 = df2[[col for col in df1.columns if "col" in col]]

# Output...

            col_A          col_B       col_C
2022-01-01    NaN            NaN         NaN
2022-01-02    NaN     101.500000         NaN
2022-01-03    NaN            NaN  630.498834
2022-01-04    NaN  121793.356436         NaN

You can use lambda functions to achieve this您可以使用lambda函数来实现这一点

import numpy as np

df2 = pd.DataFrame(data, index=['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04'], columns=['col_A', 'col_B', 'col_C']) 

df2['col_A'] = df1.apply(lambda x: x.col_A if x.col_A > x.upper else np.nan, axis=1)
df2['col_B'] = df1.apply(lambda x: x.col_B if x.col_B > x.upper else np.nan, axis=1)
df2['col_C'] = df1.apply(lambda x: x.col_C if x.col_C > x.upper else np.nan, axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM