简体   繁体   中英

creating a new data frame using differences between two columns in pandas

This is a subset of a data frame:

index  id   drug   sentences     SS1   SS2
1      2    lex     very bad      0     1
2      3    gym     very nice     1     1
3      7    effex   hard          1     0 
4      8    cymba   poor          1     1

I would like to find rows that SS1 and SS2 are different and then create a new data frame based on that. The output should be like that:

index  id   drug   sentences     SS1   SS2
1      2    lex     very bad      0     1
3      7    effex   hard          1     0 

This is my code:

df [['index','id', 'drug', 'sentences', 'SS1', 'SS2' ]] = np.where(df.SS1 != df.SS2)

But it has the following error: ValueError: Must have equal len keys and value when setting with an ndarray

Any suggestion?

One way may be following:

df_new = df[df.SS1 != df.SS2]
print(df_new)

Output:

    index  id   drug sentences  SS1  SS2
0      1   2    lex  very bad    0    1
2      3   7  effex      hard    1    0

Using where :

df_new = df.where(df.SS1 != df.SS2).dropna()
print(df_new)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM