简体   繁体   中英

create a new data frame from existing data frame based on condition

I have a data frame df

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1], 
[0,0,1,0,0,1]]))
df

Now, from data frame df I like to create a new data frame based on condition Condition: if a column contain three or more than three '1' then the new data frame column value is '1' otherwise '0'

expected output of new data frame
    1 0 1 0 0 1

You can also get it without apply . You could sum along the rows, axis=0 , and creating a boolean with gt(2) :

res = df.sum(axis=0).gt(2).astype(int)

print(res)

0    1
1    0
2    1
3    0
4    0
5    1
dtype: int32

As David pointed out, the result of the above is a series . If you require a dataframe, you can chain to_frame() at the end of it

You could do the following:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[0,1,1,0,1,0], [1,0,1,1,0,0], [1,1,0,0,0,1],[1,0,1,0,1,1], 
[0,0,1,0,0,1]]))
df_res = pd.DataFrame(df.apply(lambda c: 1 if np.sum(c) > 2 else 0))

In [6]: df_res
Out[6]: 
   0
0  1
1  0
2  1
3  0
4  0
5  1

Instead of np.sum(c) you can also do c.sum()

And if you want it transposed just do the following instead:

df_res = pd.DataFrame(df.apply(lambda c: 1 if c.sum() > 2 else 0)).T

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM