python（或 numpy）相当于 R 中的匹配项

Question

Is there any easy way in python to accomplish what the match function does in R? python 中是否有任何简单的方法来完成匹配 function 在 R 中所做的事情？ what match in R does is that it returns a vector of the positions of (first) matches of its first argument in its second. R 中的 match 所做的是它返回第二个参数中第一个参数的（第一个）匹配位置的向量。

For example, the following R snippet.例如，以下 R 片段。

> a <- c(5,4,3,2,1)
> b <- c(2,3)
> match(a,b)
[1] NA NA  2  1 NA

Translate that in python, what I am looking for is a function that does the following在 python 中翻译，我正在寻找的是一个 function，它执行以下操作

>>> a = [5,4,3,2,1]
>>> b = [2,3]
>>> match(a,b)
[None, None, 2, 1, None]

Thank you!谢谢！

Answer 1

>>> a = [5,4,3,2,1]
>>> b = [2,3]
>>> [ b.index(x) if x in b else None for x in a ]
[None, None, 1, 0, None]

Add 1 if you really need position "one based" instead of "zero based". 如果您确实需要位置“基于一个”而不是“基于零”，则添加1。

>>> [ b.index(x)+1 if x in b else None for x in a ]
[None, None, 2, 1, None]

You can make this one-liner reusable if you are going to repeat it a lot: 如果要重复很多，可以使这种单线可重用：

>>> match = lambda a, b: [ b.index(x)+1 if x in b else None for x in a ]
>>> match
<function <lambda> at 0x04E77B70>
>>> match(a, b)
[None, None, 2, 1, None]

Answer 2

A faster approach building on Paulo Scardine's answer (difference becomes more meaningful as the size of the arrays increases).一种基于Paulo Scardine 答案的更快方法（随着 arrays 的大小增加，差异变得更有意义）。 If you don't mind losing the one-liner:如果您不介意丢失单线：

from typing import Hashable, List


def match_list(a: List[Hashable], b: List[Hashable]) -> List[int]:
    return [b.index(x) if x in b else None for x in a]


def match(a: List[Hashable], b: List[Hashable]) -> List[int]:
    b_dict = {x: i for i, x in enumerate(b)}
    return [b_dict.get(x, None) for x in a]


import random

a = [random.randint(0, 100) for _ in range(10000)]
b = [i for i in range(100) if i % 2 == 0]


%timeit match(a, b)
>>> 580 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit match_list(a, b)
>>> 6.13 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

match(a, b) == match_list(a, b)
>>> True

Answer 3

one can accomplish the match functionality of R in python and return the matched indices as a dataframe index(useful for further subsetting) as可以在 python 中完成 R 的匹配功能，并将匹配的索引作为 dataframe 索引返回（对于进一步子集化有用）作为

import numpy as np
import pandas as pd
def match(ser1, ser2):
"""
return index of ser2 matching elements of ser1(or return np.nan)
equivalent to match function of R
"""
idx=[ser2.index[ser2==ser1[i]].to_list()[0] if ser1.isin(ser2)[i] == True else np.nan for i in range(len(ser1))]
return (pd.Index(idx))

python（或 numpy）相当于 R 中的匹配项

问题描述

3 个解决方案

解决方案1
22 2010-11-05 21:07:47

解决方案2
2 2021-09-29 23:29:25

解决方案3
1 2022-04-20 14:17:25

python（或 numpy）相当于 R 中的匹配项

问题描述

3 个解决方案

解决方案1 22 2010-11-05 21:07:47

解决方案2 2 2021-09-29 23:29:25

解决方案3 1 2022-04-20 14:17:25

解决方案1
22 2010-11-05 21:07:47

解决方案2
2 2021-09-29 23:29:25

解决方案3
1 2022-04-20 14:17:25