简体   繁体   English

比较 numpy 数组中每个元素的矢量化方法

[英]vectorized way to compare each element in numpy array

I was wondering whether there is a way to compare each element (regardless of indexical position) in a numpy array.我想知道是否有一种方法可以比较 numpy 数组中的每个元素(无论索引位置如何)。 I often find myself using arrays from pandas dataframes and I'd like to use the underlying numpy array to do compare each element.我经常发现自己使用 pandas 数据帧中的 arrays 并且我想使用底层的 numpy 数组来比较每个元素。 I know I can do a fast-elementwise comparison like this:我知道我可以像这样进行快速元素比较:

dfarr1 = pd.DataFrame(np.arange(0,1000))
dfarr2 = pd.DataFrame(np.arange(1000,0,-1))
dfarr1.loc[(dfarr1.values == dfarr2.values)]
# outputs: 500

(the above is just a toy example, obviously) But what I'd like to do is rather the equivalent of two loops over all the elements, but in a way that is as fast as possible: (显然,上面只是一个玩具示例)但我想做的是相当于所有元素的两个循环,但以尽可能快的方式:

for ir in df.itertuples():
   for ir2 in country_df.itertuples():
      if df['city'][ir[0]] == country_df['Capital'][ir2[0]]:
         df['country'][ir[0]] = country_df['Country'][ir2[0]]

The thing is that my dataframes contains many thousands of elements and the above is simply too slow (not least given that I'm sure I'll do similar such operations in the future on different, similarly long dataframes and so clearing this once and for all would be good).问题是我的数据帧包含数千个元素,而上面的元素太慢了(尤其是考虑到我确信将来我会在不同的、同样长的数据帧上执行类似的此类操作,因此一劳永逸地清除它一切都会好的)。 The idea is that I've parsed a few thousand files and got their geodata (=df in the above) and I have a quite massive file with cities and their corresponding countries as a lookup (=country_df).这个想法是我已经解析了几千个文件并获得了它们的地理数据(上面的=df),并且我有一个相当大的文件,其中包含城市及其对应的国家/地区作为查找(=country_df)。 The idea is to see if the cities in the df match those in the lookup and if so I'd like to add the corresponding country in a new column (at the same row index) of the df with the parsed geodata.这个想法是查看 df 中的城市是否与查找中的城市匹配,如果是,我想在 df 的新列(在同一行索引处)中添加相应的国家和解析的地理数据。 Anyway, this is just an example of what I'd need at (ideally much) higher speed than the above way.无论如何,这只是我需要(理想情况下)比上述方式更高的速度的一个例子。 Many thanks!非常感谢!

You can try this:你可以试试这个:

 df1 = pd.DataFrame({'city': ['New York City', 'Los Angeles', 'Paris', 'Berlin', 'Beijing'], 
                     'country' : [None, None, None, None, None] })

df2 = pd.DataFrame({'city' : ['New York City', 'Paris', 'Berlin', 'Beijing', 'Los Angeles', 'Rome'],
                    'country': ['USA', 'France', 'Germany', 'China', 'USA', 'Italy']})

Now we use fillna method on df1 with df2['country'] series as filling values:现在我们在df1上使用df2['country']系列作为填充值的fillna方法:

df1['country'] = df1.set_index('city')['country'].fillna(df2.set_index('city')['country'])\
                    .reset_index(drop=True)

print(df1)

    city          country
0   New York City  USA
1   Los Angeles    USA
2   Paris          France
3   Berlin         Germany
4   Beijing        China

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较一个numpy数组和另一个数组的每个元素 - Compare a numpy array to each element of another one 将 numpy 数组与矩阵的每一行进行比较以计算相似项(矢量化) - Compare numpy array to every row of matrix to count similar items (vectorized) 有没有一种快速方法可以将numpy数组中的一个元素与该数组中其余元素进行比较? - Is there a fast way to compare one element in a numpy array to the rest of the elements in that array? Numpy:将数组的每个元素与所有其他元素进行比较(± 常数) - Numpy: compare each element of array with all other elements (± constant) 如何将 numpy 数组的每一行向量与其自身和每个元素进行比较 - How to compare each row of vectors of numpy array to itself and every element 将 Numpy 数组中的每个元素与同一数组中的其他元素进行比较的高效算法 - Efficient algoritm to compare each element in a Numpy array with each other element in the same array 向另一个数组索引的数组添加向量化方法-Python / NumPy - Vectorized way of adding to an array that is indexed by another array - Python/NumPy 基于另一个数组更改numpy数组值的矢量化方法 - vectorized way to change numpy array values based on another array Numpy 数组,但每个 1 是一个元素 - Numpy array but each 1 is an element 如何将矢量化函数应用于 numpy 数组的前一个元素? - How can I apply a vectorized function to the previous element of a numpy array?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM