简体   繁体   中英

How to find all occurrences for each value in column A that is also in column B

Using Pandas, I'm trying to find the most recent overlapping occurrence of some value in Column A that also happens to be in Column B (though, not necessarily occurring in the same row); This is to be done for all rows in column A.

I've accomplished something close with an n^2 solution (by creating a list of each column and iterating through with a nested for-loop), but I would like to use something faster if possible; as this needs to be implemented in a table with tens of thousands of entries. (So, a Vectorized solution would be ideal, but I am more looking for the "right" way to do this.)

df['idx'] = range(0, len(df.index))
A = list(df['r_A'])
B = list(df['r_B'])
A_B_Dict = {}

for i in range(0, len(B)-1):
    for j in range(0, len(A)-1):
        if B[i] == A[j]:
            A_search = df.loc[df['r_A'] == A[j]].index
            A_B_Dict[B[i]] = A_search

Given some df like so:

df = [[1, 'A', 'A'],
      [2, 'B', 'D'],
      [3, 'C', 'B']
      [4, 'D', 'D']
      ]
df = pd.DataFrame(data, columns = ['idx', 'A', 'B'])

It should give back something like:

 A_B_Dict = {'A': 1, 'B': 3, 'C':None', 'D':4}

Such that, the most recent observance (Or all observances, for that matter) from Column A that occur in Column B are stored as the value of A_B_Dict where the key of A_B_Dict is the original value observed in Column A.

IIUC

d=dict(zip(df.B,df.idx))
dict(zip(df.A,df.A.map(d)))
{'A': 1.0, 'B': 3.0, 'C': nan, 'D': 4.0}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM