简体   繁体   中英

How to check if a cell of a Dataframe exists as a key in a dict, and if it does, check if another cell in same row exists in a list in a dict

I currently have a dataframe like this

   colA   colB   colC    colD
   1      1a     Yes     Yes
   1      1b     Yes     Yes
   2      2a     Yes     Yes
   2      2b     Yes     Yes
   2      2c     Yes     Yes
   3      3a     Yes     Yes
   4      4a     Yes     Yes

Say I have a dictionary like this

filterDict = {
   "1" : ["1a","1b"]
   "2" : ["2a","2b"]
}

I want it so that if in colA a value exists as a key, check if colB in that same row exists in the value list of the dict for that key so that the resulting dataframe would look like this

   colA   colB   colC    colD
   1      1a     No      No
   1      1b     No      No
   2      2a     No      No
   2      2b     No      No
   2      2c     Yes     Yes
   3      3a     Yes     Yes
   4      4a     Yes     Yes

I am relatively new to pandas and am unsure how to do this. What I previously had was something hardcoded like this

mask = ((df["colA"] == "1") & (df["colB"].isin(["1a","1b"])) |
       ((df["colB"] == "2") & (df["colB"].isin(["2a","2b"]))

df.loc[mask,["colC","colD"]] = ["No","No"]

But I would like this to be dynamic so that it could be put in the dict format

Flatten the fitersDict into list of tuples then create a multindex from it, the use Multindex.isin to test from the occurrence of multiindex in the columns colA , colB of given dataframe

i1 = df.set_index(['colA', 'colB']).index
i2 = pd.MultiIndex.from_tuples((a, b) for a, l in filterDict.items() for b in l)

mask = ~i1.isin(i2)
df['colC'], df['colD'] = mask, mask

  colA colB   colC   colD
0    1   1a  False  False
1    1   1b  False  False
2    2   2a  False  False
3    2   2b  False  False
4    2   2c   True   True
5    3   3a   True   True
6    4   4a   True   True

You could try this

for index, row in df.iterrows():
    if str(row['colA']) in filterDict:
        if row['colB'] in filterDict[str(row['colA'])]:
            df.loc[index, 'colC'] = 'No'
            df.loc[index, 'colD'] = 'No'
            continue

    df.loc[index, 'colC'] = 'Yes'
    df.loc[index, 'colD'] = 'Yes'
        

Based on your expected output, I've assumed 'No' if a match is found, otherwise 'Yes' .

You can try this alternative:

df[['colC', 'colD']] = pd.DataFrame(zip(*df[['colA', 'colB']].apply(
    lambda _: (
        'NO' if str(_[0]) in filterDict else 'YES',
        'NO' if _[1] in filterDict.get(str(_[0]), []) else 'YES'
    ), axis=1
).values)).T

Or in the case where ColB must equal ColC you can simply try this:

df['colC'] = df[['colA', 'colB']].apply(

    lambda _: 'NO' if str(_[0]) in filterDict and _[1] in filterDict.get(str(_[0]), []) else 'YES',
    axis=1
)
df['colD'] = df['colC'].copy()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM