简体   繁体   English

如何在 numpy 字符串数组中查找 substring 的所有出现

[英]How to find all occurences of a substring in a numpy string array

I'm trying to find all occurences of a substring in a numpy string array.我试图在 numpy 字符串数组中查找 substring 的所有出现。 Let's say:比方说:

myArray = np.array(['Time', 'utc_sec', 'UTC_day', 'Utc_Hour'])
sub = 'utc'

It should be case insensitive, so the method should return [1,2,3].它应该不区分大小写,因此该方法应该返回 [1,2,3]。

A vectorized approach using np.char.lower and np.char.find使用np.char.lowernp.char.find矢量化方法

import numpy as np
myArray = np.array(['Time', 'utc_sec', 'UTC_day', 'Utc_Hour'])
res = np.where(np.char.find(np.char.lower(myArray), 'utc') > -1)[0]
print(res)

Output Output

[1 2 3]

The idea is to use np.char.lower to make np.char.find case-insensitive , then fetch the indices that contains the sub-string using np.where .这个想法是使用np.char.lower使np.char.find不区分大小写,然后使用np.where获取包含子字符串的索引。

You can use if sub in string to check it.您可以使用if sub in string来检查它。

import numpy as np

myArray = np.array(['Time', 'utc_sec', 'UTC_day', 'Utc_Hour'])
sub = 'utc'

count = 0
found = []
for item in myArray:
    if sub in item.lower():
        count += 1
        found.append(count)

print(found)

output: output:

[1, 2, 3]

We can use list comprehension te get the right indexes:我们可以使用列表comprehension来获得正确的索引:

occ = [i for i in range(len(myArray)) if 'utc' in myArray[i].lower()]

Output Output

>>> print(occ)
... [1, 2, 3]

Let's make a general use from this question: we will set up a function returning occurences indexes of any sub-character inside a numpy string array .让我们从这个问题做一个一般性的使用:我们将设置一个 function 返回numpy string arrayany字符的出现索引。

get_occ_idx(sub, np_array):
    """ Occurences index of substring in a numpy string array
    """
    
    assert sub.islower(), f"Your substring '{sub}' must be lower case (should be : {sub.lower()})"
    assert all(isinstance(x, str)==False for x in np_array), "All items in the array must be strings"
    assert all(sub in x.lower() for x in np_array), f"There is no occurence of substring :'{sub}'"
    
    occ = [i for i in range(len(np_array)) if sub in np_array[i].lower()]
    
    return occ

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中查找字符串中所有出现的子字符串 - How to find all occurences of substring in string, in python 查找字符串中每个子字符串的所有出现 - Find all Occurences of Every Substring in String 查找字符串中子字符串的少量出现 - find small occurences of a substring in a string 查找numpy数组中两个数字指定匹配的所有匹配项 - Find all occurences of a specified match of two numbers in numpy array 如何在 python 的字符串中找到 substring 的出现次数? - How can I find the number of occurences of a substring in a string in python? 如何删除子字符串中除第一个字符串外的所有子字符串 - How to delete all occurences of a substring except the first one in any string Python - 删除字符串中子字符串的所有出现 - Python - Remove All Occurences Of A Substring Within A String 查找 NumPy 数组中包含子字符串的所有位置(最有效?) - Find all positions in a NumPy array that contain a substring (most efficient?) 如何找到文件python中所有出现的字符串(不区分大小写)? - How to find all occurences of string (case-insensitive) in a file, python? 如何将 numpy.ndarray 转换为字符串,然后使用分隔符从数组的每个元素中查找子字符串 - How to convert numpy.ndarray to string and then find substring from each element of the array using a delimiter
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM