简体   繁体   English

根据第一个索引遍历数组

[英]Looping through an array based on the first index

I have two arrays and I am wanting to loop through a second array to only return arrays whose first element is equal to an element from another array. 我有两个数组,我想遍历第二个数组,只返回第一个元素等于另一个数组中的元素的数组。

 a = [10, 11, 12, 13, 14]
 b = [[9, 23, 45, 67, 56, 23, 54], [10, 8, 52, 30, 15, 47, 109], [11, 81, 
 152, 54, 112, 78, 167], [13, 82, 84, 63, 24, 26, 78], [18, 182, 25, 63, 96, 
 104, 74]]

I have two different arrays, a and b. 我有两个不同的数组,a和b。 I would like to find a way to look through each of the sub-arrays(?) within b in which the first value is equal to the values in array a to create a new array, c. 我想找到一种方法来浏览b中的每个子数组(?),其中第一个值等于数组a中的值以创建一个新数组c。

The result I am looking for is: 我正在寻找的结果是:

  c = [[10, 8, 52, 30, 15, 47, 109],[11, 81, 152, 54, 112, 78, 167],[13, 82, 84, 63, 24, 26, 78]]

Does Python have a tool to do this in a way Excel has MATCH()? Python是否具有以Excel具有MATCH()的方式执行此操作的工具?

I tried looping in a manner such as: 我尝试以如下方式进行循环:

 for i in a:
      if i in b:
          print (b)

But because there are other elements within the array, this way is not working. 但是,因为数组中还有其他元素,所以这种方式不起作用。 Any help would be greatly appreciated. 任何帮助将不胜感激。

Further explanation of the problem: 问题的进一步说明:

a = [5, 6, 7, 9, 12] a = [5、6、7、9、12]

I read in a excel file using XLRD (b_csv_data): 我使用XLRD(b_csv_data)读取了一个Excel文件:

 Start  Count   Error   Constant    Result1 Result2 Result3 Result4
 5       41       0       45             23      54      66       19
 5.4     44       1       21             52      35       6       50
 6       16       1       42             95      39       1       13
 6.9     50       1       22             71      86      59       97
 7       38       1       43             50      47      83       67
  8      26       1       29             100     63      15       40
 9       46       0       28             85       9      27       81
 12      43       0       21             74      78      20       85

Next, I created a look to read in a select number of rows. 接下来,我创建了一个外观以读取选定的行数。 For simplicity, this file above only has a few rows. 为简单起见,上面的这个文件只有几行。 My current file has about 100 rows. 我当前的文件大约有100行。

for r in range (1, 7): #skipping headers and only wanting first few rows to start

     b_raw = b_csv_data.row_values(r) 
     b = np.array(b_raw) # I created this b numpy array from the line of code above

Use np.isin - 使用np.isin

In [8]: b[np.isin(b[:,0],a)]
Out[8]: 
array([[ 10,   8,  52,  30,  15],
       [ 11,  81, 152,  54, 112],
       [ 13,  82,  84,  63,  24]])

With sorted a , we can also use np.searchsorted - 通过对a进行排序,我们还可以使用np.searchsorted

idx = np.searchsorted(a,b[:,0])
idx[idx==len(a)] = 0
out = b[a[idx] == b[:,0]]

If you have an array with different number of elements per row, which is essentially array of lists, you need to modify the slicing part. 如果每行具有不同数量的元素的数组(本质上是列表数组),则需要修改切片部分。 So, in that case, get the first off elements - 因此,在这种情况下,请首先使用-

b0 = [bi[0] for bi in b]

Then, use b0 to replace all instances of b[:,0] in earlier posted methods. 然后,使用b0替换较早发布的方法中b[:,0]所有实例。

Use list comprehension: 使用清单理解:

c = [l for l in b if l[0] in a]

Output: 输出:

[[10, 8, 52, 30, 15], [11, 81, 152, 54, 112], [13, 82, 84, 63, 24]]

If your list or array s are considerably large, using numpy.isin can be significantly faster: 如果listarray非常大,则使用numpy.isin速度可能会更快:

b[np.isin(b[:, 0], a), :]

Benchmark: 基准测试:

a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56], [10, 8, 52, 30, 15], [11, 81, 152, 54, 112], 
 [13, 82, 84, 63, 24], [18, 182, 25, 63, 96]]

list_comp, np_isin = [], []
for i in range(1,100):
    a_test = a * i
    b_test = b * i
    list_comp.append(timeit.timeit('[l for l in b_test if l[0] in a_test]', number=10, globals=globals()))
    a_arr = np.array(a_test)
    b_arr = np.array(b_test)
    np_isin.append(timeit.timeit('b_arr[np.isin(b_arr[:, 0], a_arr), :]', number=10, globals=globals()))

在此处输入图片说明

While it is not clear and concise, I would recommend using list comprehension if the b is shorter than 100. Otherwise, numpy is your way to go. 虽然不清楚和简洁,但如果b小于100,我建议使用list numpy 。否则,您可以使用numpy

You are doing it reverse. 您正在反向进行。 It is better to loop through the elements of b array and check if it is present in a. 最好遍历b数组的元素并检查a中是否存在它。 If yes then print that element of b. 如果是,则打印b的元素。 See the answer below. 请参阅下面的答案。

a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56, 23, 54], [10, 8, 52, 30, 15, 47, 109], [11, 81, 152, 54, 112, 78, 167], [13, 82, 84, 63, 24, 26, 78], [18, 182, 25, 63, 96, 104, 74]]

for bb in b:  # if you want to check only the first element of b is in a
    if bb[0] in a:
            print(bb)

for bb in b:   # if you want to check if any element of b is in a
    for bbb in bb:
        if bbb in a:
            print(bb)

Output: 输出:

[10, 8, 52, 30, 15, 47, 109]
[11, 81, 152, 54, 112, 78, 167]
[13, 82, 84, 63, 24, 26, 78]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM