简体   繁体   English

如何在字符串中查找字符并获取所有索引?

[英]How to find char in string and get all the indexes?

I got some simple code:我得到了一些简单的代码:

def find(str, ch):
    for ltr in str:
        if ltr == ch:
            return str.index(ltr)
find("ooottat", "o")

The function only return the first index.该函数只返回第一个索引。 If I change return to print, it will print 0 0 0. Why is this and is there any way to get 0 1 2 ?如果我改变 return 打印,它会打印 0 0 0。为什么会这样,有什么办法可以得到0 1 2

This is because str.index(ch) will return the index where ch occurs the first time.这是因为str.index(ch)将返回ch第一次出现的索引。 Try:尝试:

def find(s, ch):
    return [i for i, ltr in enumerate(s) if ltr == ch]

This will return a list of all indexes you need.这将返回您需要的所有索引的列表。

PS Hugh's answer shows a generator function (it makes a difference if the list of indexes can get large). PS Hugh 的回答显示了一个生成器函数(如果索引列表变大会有所不同)。 This function can also be adjusted by changing [] to () .也可以通过将[]更改为()来调整此功能。

I would go with Lev, but it's worth pointing out that if you end up with more complex searches that using re.finditer may be worth bearing in mind (but re's often cause more trouble than worth - but sometimes handy to know)我会和 Lev 一起去,但值得指出的是,如果你最终得到更复杂的搜索,那么使用 re.finditer 可能值得牢记(但 re 通常会造成比价值更多的麻烦 - 但有时很容易知道)

test = "ooottat"
[ (i.start(), i.end()) for i in re.finditer('o', test)]
# [(0, 1), (1, 2), (2, 3)]

[ (i.start(), i.end()) for i in re.finditer('o+', test)]
# [(0, 3)]

Lev's answer is the one I'd use, however here's something based on your original code: Lev 的答案是我会使用的答案,但是这里有一些基于您的原始代码的内容:

def find(str, ch):
    for i, ltr in enumerate(str):
        if ltr == ch:
            yield i

>>> list(find("ooottat", "o"))
[0, 1, 2]
def find_offsets(haystack, needle):
    """
    Find the start of all (possibly-overlapping) instances of needle in haystack
    """
    offs = -1
    while True:
        offs = haystack.find(needle, offs+1)
        if offs == -1:
            break
        else:
            yield offs

for offs in find_offsets("ooottat", "o"):
    print offs

results in结果是

0
1
2
def find_idx(str, ch):
    yield [i for i, c in enumerate(str) if c == ch]

for idx in find_idx('babak karchini is a beginner in python ', 'i'):
    print(idx)

output:输出:

[11, 13, 15, 23, 29]
x = "abcdabcdabcd"
print(x)
l = -1
while True:
    l = x.find("a", l+1)
    if l == -1:
        break
    print(l)

As the rule of thumb, NumPy arrays often outperform other solutions while working with POD, Plain Old Data.根据经验,NumPy 数组在处理 POD、Plain Old Data 时通常优于其他解决方案。 A string is an example of POD and a character too.字符串也是 POD 和字符的一个例子。 To find all the indices of only one char in a string, NumPy ndarrays may be the fastest way:要查找字符串中仅一个字符的所有索引,NumPy ndarrays 可能是最快的方法:

def find1(str, ch):
  # 0.100 seconds for 1MB str 
  npbuf = np.frombuffer(str, dtype=np.uint8) # Reinterpret str as a char buffer
  return np.where(npbuf == ord(ch))          # Find indices with numpy

def find2(str, ch):
  # 0.920 seconds for 1MB str 
  return [i for i, c in enumerate(str) if c == ch] # Find indices with python

Get all the position in just one line在一行中获取所有位置

word = 'Hello'
to_find = 'l'

# in one line
print([i for i, x in enumerate(word) if x == to_find])

Using pandas we can do this and return a dict with all indices, simple version: 使用pandas我们可以这样做并返回带有所有索引的dict,简单版本:

import pandas as pd

d = (pd.Series(l)
     .reset_index()
     .groupby(0)['index']
     .apply(list)
     .to_dict())

But we can build in conditions too, eg only if two or more occurences: 但我们也可以建立条件,例如,只有两个或更多的出现:

d = (pd.Series(l)
     .reset_index()
     .groupby(0)['index']
     .apply(lambda x: list(x) if len(list(x)) > 1 else None)
     .dropna()
     .to_dict())

This is slightly modified version of Mark Ransom 's answer that works if ch could be more than one character in length.这是Mark Ransom答案的略微修改版本,如果ch长度可能超过一个字符,则该答案有效。

def find(term, ch):
    """Find all places with ch in str
    """
    for i in range(len(term)):
        if term[i:i + len(ch)] == ch:
            yield i

All the other answers have two main flaws:所有其他答案都有两个主要缺陷:

  1. They do a Python loop through the string, which is horrifically slow, or他们通过字符串执行 Python 循环,这非常慢,或者
  2. They use numpy which is a pretty big additional dependency.他们使用 numpy,这是一个非常大的附加依赖项。
def findall(haystack, needle):
    idx = -1
    while True:
        idx = haystack.find(needle, idx+1)
        if idx == -1:
            break
        yield idx

This iterates through haystack looking for needle , always starting at where the previous iteration ended.这会在haystack迭代寻找needle ,总是从上一次迭代结束的地方开始。 It uses the builtin str.find which is much faster than iterating through haystack character-by-character.它使用内置的str.find ,这比逐字符迭代haystack快得多。 It doesn't require any new imports.它不需要任何新的进口。

To embellish the five-star one-liner posted by @Lev and @Darkstar:为了修饰@Lev 和@Darkstar 发布的五星级单线:

word = 'Hello'
to_find = 'l'
print(", ".join([str(i) for i, x in enumerate(word) if x == to_find]))

This just makes the separation of index numbers more obvious.这只是使索引号的分离更加明显。
Result will be: 2, 3结果将是: 2, 3

You could try this你可以试试这个

def find(ch,string1):
    for i in range(len(string1)):
        if ch == string1[i]:
            pos.append(i)        

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM