简体   繁体   中英

check for value in list of Pandas data frame columns

I have a pandas dataframe data that looks like this

    MED1    MED2    MED3    MED4    MED5
0   60735   24355   33843   16475   9995
1   10126   5789    17165   90000   90000
2   5789    19675   30553   90000   90000
3   60735   17865   34495   90000   90000
4   19675   5810    90000   90000   90000

​I want to create a new bool column "med" that has True/False based on ​60735 in the columns MED1...MED5 I am trying this and am not sure how to make it work.

DF['med'] = (60735 in [DF['MED1'], DF['MED2']])

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

MED1..MED5 represent drugs being taken by a patient at a hospital visit. I have a list of about 20 drugs for which I need to know if the patien was taking them. Each drug is coded with a number but has a name. A nice solution would look something like (below) but how do I do this with pandas.

drugs = {'drug1':60735, 'drug2':5789}  
for n in drugs.keys():
    DF[n] = drugs[n] in DF[['MED1', 'MED2', 'MED3', 'MED4', 'MED5']]

@Mai's answer will of course work - it may be a bit more standard to write it like this, with the | operator.

df['med'] = (df['MED1'] == 60735) | (df['MED1'] == 60735)

If you want to check for a value in all (or many) columns, you could also use isin as below. The isin checks whether the value in the list is in each cell, and the any(1) returns True if any element in each row is True.

df['med'] = df.isin([60735]).any(1)

Edit: Based on your edited question, would this work?

for n in drugs:
    df[n] = df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)

I am still confused. But part of what you want may be this:

import numpy as np
DF['med'] = np.logical_or(DF['MED1'] == 60735, DF['MED2'] == 60735)

Here are a few %timeit comparisons of some methods to return bools from a dataframe column.

In [2]: %timeit df['med'] = [bool(x) if int(60735) in x else False for x in enumerate(df['MED1'])]
1000 loops, best of 3: 379 µs per loop

In [3]: %timeit df['med'] = (df['MED1'] == 60735)
1000 loops, best of 3: 649 µs per loop

In [4]: %timeit df['med'] = df['MED1'].isin([60735])
1000 loops, best of 3: 404 µs per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM