简体   繁体   中英

Python Create new DataFrame by conditionally checking two separate dataframes

I'm rather new to Python and Pandas as well. I would like to create a new data frame from conditionally checking two existing/separate Data frames. Both dataframes and the new one I intend to create have the same size and indexes.

The existing dataframes are an equivalent of this:

df1 = pd.DataFrame(np.random.randn(5, 4), columns=['1', '2', '3', '4'])
df2 = pd.DataFrame(np.random.randn(5, 4), columns=['1', '2', '3', '4'])

>>> df1
      1         2         3         4
0 -1.435173  0.230277  0.350859  0.200648
1  0.070976  0.827203 -0.874663 -0.382205
2 -1.991096  0.884184  0.992237 -1.289843
3 -1.615785 -1.737100 -0.646080 -0.782255
4  0.265713 -0.086915 -0.115174 -2.156504

>>> df2
      1         2         3         4
0 -1.504168 -0.613035 -0.145030  0.947341
1 -0.684728  2.281224  0.771786 -0.318042
2  1.374862  0.820146 -1.212940 -0.370513
3 -0.110245  2.548307  0.391108  0.069860
4 -0.631652 -0.329425 -0.282044 -0.229726

Now I want to create a new df3 based on these DataFrames. I already created an array with the same shape via:

df3 = pd.DataFrame(np.zeros(df1.shape), index=df1.index, columns=df1.columns)

First I like to set the starting value of the first row (row = 0) based on a list: L = [7,5,2,3]

My conditions to fill the remainder of df3 are:

if df1 > 0 : df3 = previous value within that column - df1 value

else:

if df2 >0 : df3 = value first row

else: df3 = previous value within that column

Would be very happy with any advice on this. Thank you.

You can set the first row of df3 using .loc and just assigning the list:

df3.loc[0] = L

I'm not totally sure how you want the rest to behave: when you reference the "previous value in each column" of df3 do you mean the value before being transformed by df1 and df2 (0) or the value afterwards? If the former, you can use np.where to perform your logical operations:

df3.loc[1:] = np.where(
    df1 > 0, 
    df3.shift() - df1, 
    np.where(
        df2 > 0, 
        df3.loc[0], 
        df3.shift()
    )
)[1:]

You can put it all on one line, but it's easier to see how it works like this. Let's break that down a bit. To begin, you use .loc again so that you're only assigning the rows of df3 after the first one. Then you're using np.where , which returns an array based on a conditional statement.

df3.loc[1:] = np.where(

The next line is your first conditional: cell locations where df1 is positive.

    df1 > 0, 

If it's positive, the array takes the value from the next line, which is the values of df3 shifted down one row minus the value of df1 .

    df3.shift() - df1, 

If the value of a cell in df1 isn't positive, you assign the value based on a second np.where conditional. Here, the first test is whether the value of the cell in df2 is positive, and if it is, you use the value from the first row of df3 . If not, you use .shift() again to use the previous row of df3.

    np.where(
        df2 > 0, 
        df3.loc[0], 
        df3.shift()
    )

Finally, you only take the rows of the returned array beyond the first one.

)[1:]

That should do it! Let me know if I've missed something.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM