简体   繁体   English

按照列表中值的顺序过滤排序的 NumPy 数组的行

[英]Filter rows of a sorted NumPy array following the order of values in a list

I am trying to iterate over a NumPy array to create a list of lists but the for loop used is appending to the list of lists in alphabetical order rather than by the order of occurrence.我正在尝试遍历 NumPy 数组以创建列表列表,但使用的 for 循环按字母顺序而不是按出现顺序附加到列表列表中。

Here is a portion of my NumPy array that I can use as an example:这是我可以用作示例的 NumPy 数组的一部分:

tarifas = np.array([['Afganistán', '577.21', '0.9360168799091559', '1.01745744495737'],
                    ['Albania', '5450.0', '1.1439867079655244', '0.9195410037811979'],
                    ['Alemania', '49690', '1.0034542200895549', '0.9873874704432137'],
                    ['Angola', '3670.0', '0.931103978746121', '1.162652536895962'],
                    ['Antigua y Barbuda', '18170', '0.7795684991736309', '0.6399312443495023'],
                    ['Arabia Saudita', '23490', '1.0573676413333202', '0.7477763277701148'],
                    ['Argelia', '4650.0', '0.7969840140783656', '0.5123046862189027'],
                    ['Argentina', '9050.0', '1.3647162509775996', '0.48274125735042017'],
                    ['Armenia', '4450.0', '1.4545784506262867', '1.430465487479917'],
                    ['Australia', '57200', '0.7293018985322222', '1.1744384938116095'],
                    ['Austria', '52470', '1.2396562976033307', '0.8630735107719588'],
                    ['Azerbaiyán', '4780.0', '0.9111186496911305','0.534268284966654']])

I want to create a list of lists using another list to iterate over which would have the specific name of the countries I need to find in the array, ie我想创建一个列表列表,使用另一个列表进行迭代,该列表将具有我需要在数组中找到的国家/地区的特定名称,即

list_countries = ["Angola", "Austria", "Argentina", "Albania", "Armenia"]

Notice how the list is not in alphabetical order, therefore the list of lists should respect this order.请注意列表不是按字母顺序排列的,因此列表列表应遵循此顺序。 The output after iteration should be the following:迭代后的output应该如下:

new_list_of_countries = [['Angola' '3670.0' '0.931103978746121' '1.162652536895962'], 
                         ['Austria' '52470' '1.2396562976033307' '0.8630735107719588'],  
                         ['Argentina' '9050.0' '1.3647162509775996' '0.48274125735042017'], 
                         ['Albania' '5450.0' '1.1439867079655244' '0.9195410037811979'], 
                         ['Armenia' '4450.0' '1.4545784506262867' '1.430465487479917']]

Here is the code I used:这是我使用的代码:

tarifas_paises_escogidos = []
for i in tarifas:
    for v in list_countries:
         if str(v) in str(i):
               tarifas_paises_escogidos.append(i)
print(np.array(tarifas_paises_escogidos))

Using list comprehension with sorted :将列表推导与sorted一起使用:

sorted([t for t in tarifas if t[0] in list_countries], 
        key=lambda x: list_countries.index(x[0]))

Output: Output:

[['Angola', '3670.0', '0.931103978746121', '1.162652536895962'],
 ['Austria', '52470', '1.2396562976033307', '0.8630735107719588'],
 ['Argentina', '9050.0', '1.3647162509775996', '0.48274125735042017'],
 ['Albania', '5450.0', '1.1439867079655244', '0.9195410037811979'],
 ['Armenia', '4450.0', '1.4545784506262867', '1.430465487479917']]

One without using list comprehension:一个不使用列表理解的:

tarifas_paises_escogidos = []
for t in tarifas:
    # for v in list_countries: You don't need this
    if t[0] in list_countries:
        tarifas_paises_escogidos.append(t)
print(tarifas_paises_escogidos)

which yields filtered but unsorted:产生过滤但未排序:

[['Albania', '5450.0', '1.1439867079655244', '0.9195410037811979'], 
 ['Angola', '3670.0', '0.931103978746121', '1.162652536895962'], 
 ['Argentina', '9050.0', '1.3647162509775996', '0.48274125735042017'], 
 ['Armenia', '4450.0', '1.4545784506262867', '1.430465487479917'], 
 ['Austria', '52470', '1.2396562976033307', '0.8630735107719588']]

Then you sort (and do assign it back:):然后你排序(并把它分配回来:):

tarifas_paises_escogidos = sorted(tarifas_paises_escogidos, key=lambda x: list_countries.index(x[0]))

which makes the above output.这使得上面的output。

Insight:洞察力:

In the lambda above, x almost means nothing.在上面的lambda中, x几乎没有任何意义。 It just means that what ever input lambda gets, it is defined as x , and used for indexing (ie x[0] ).它只是意味着输入lambda得到的东西,它被定义为x ,并用于索引(即x[0] )。

It is identical as:它与以下内容相同:

def some_func(x):
    return list_countries.index(x[0])

then used in sorted :然后用于sorted

tarifas_paises_escogidos = sorted(tarifas_paises_escogidos, key=some_func)

But you may often find defining a function for just one use case quite inefficient.但是您可能经常会发现仅针对一个用例定义 function 效率很低。 That's when lambda kicks in:).那是lambda开始的时候:)。

Since the original NumPy array, tarifas , is sorted alphabetically, you can use np.searchsorted to get the indices corresponding to the list_countries :由于原始 NumPy 数组tarifas是按字母顺序排序的,因此您可以使用np.searchsorted获取与list_countries对应的索引:

indices = np.searchsorted(tarifas[:, 0], list_countries)
print(indices)
# [ 3 10  7  1  8]

and then use fancy indexing (indexing arrays using arrays) to get the desired result:然后使用花哨的索引(使用数组索引 arrays)来获得所需的结果:

result = tarifas[indices]
print(result)
# [['Angola' '3670.0' '0.931103978746121' '1.162652536895962']
#  ['Austria' '52470' '1.2396562976033307' '0.8630735107719588']
#  ['Argentina' '9050.0' '1.3647162509775996' '0.48274125735042017']
#  ['Albania' '5450.0' '1.1439867079655244' '0.9195410037811979']
#  ['Armenia' '4450.0' '1.4545784506262867' '1.430465487479917']]

For big arrays this vectorized approach should be much faster than the solution using Python's for-loops from the Chris's answer .对于大 arrays 来说,这种矢量化方法应该比使用来自Chris 的答案的 Python 的 for 循环的解决方案快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM