简体   繁体   English

Python:在字符串中查找子串并返回子串的索引

[英]Python: Find a substring in a string and returning the index of the substring

I have:我有:

  • a function: def find_str(s, char)一个函数: def find_str(s, char)

  • and a string: "Happy Birthday" ,和一个字符串: "Happy Birthday"

I essentially want to input "py" and return 3 but I keep getting 2 to return instead.我本质上想输入"py"并返回3但我一直让2返回。

Code:代码:

def find_str(s, char):
    index = 0           
    if char in s:
        char = char[0]
        for ch in s:
            if ch in s:
                index += 1
            if ch == char:
                return index

    else:
        return -1

print(find_str("Happy birthday", "py"))

Not sure what's wrong!不知道出了什么问题!

There's a builtin method find on string objects.在字符串对象上有一个内置方法find

s = "Happy Birthday"
s2 = "py"

print(s.find(s2))

Python is a "batteries included language" there's code written to do most of what you want already (whatever you want).. unless this is homework :) Python 是一种“包含电池的语言”,其中编写的代码可以完成您想要的大部分工作(无论您想要什么).. 除非这是作业:)

find returns -1 if the string cannot be found.如果find字符串,则find返回 -1。

Ideally you would use str.find or str.index like demented hedgehog said.理想情况下,您会像疯狂刺猬所说的那样使用str.findstr.index But you said you can't ...但是你说你不能...

Your problem is your code searches only for the first character of your search string which(the first one) is at index 2.您的问题是您的代码仅搜索搜索字符串的第一个字符(第一个字符)位于索引 2 处。

You are basically saying if char[0] is in s , increment index until ch == char[0] which returned 3 when I tested it but it was still wrong.您基本上是说如果char[0]s ,则增加index直到ch == char[0]在我测试时返回 3 但它仍然是错误的。 Here's a way to do it.这是一种方法。

def find_str(s, char):
    index = 0

    if char in s:
        c = char[0]
        for ch in s:
            if ch == c:
                if s[index:index+len(char)] == char:
                    return index

            index += 1

    return -1

print(find_str("Happy birthday", "py"))
print(find_str("Happy birthday", "rth"))
print(find_str("Happy birthday", "rh"))

It produced the following output:它产生了以下输出:

3
8
-1

There is one other option in regular expression , the search method正则表达式中还有另一种选择,即search方法

import re

string = 'Happy Birthday'
pattern = 'py'
print(re.search(pattern, string).span()) ## this prints starting and end indices
print(re.search(pattern, string).span()[0]) ## this does what you wanted

By the way, if you would like to find all the occurrence of a pattern, instead of just the first one, you can use finditer method顺便说一句,如果你想找到一个模式的所有出现,而不仅仅是第一个,你可以使用finditer方法

import re

string = 'i think that that that that student wrote there is not that right'
pattern = 'that'

print([match.start() for match in re.finditer(pattern, string)])

which will print all the starting positions of the matches.这将打印匹配的所有起始位置。

Adding onto @demented hedgehog answer on using find()添加到 @demented 刺猬答案中使用find()

In terms of efficiency效率方面

It may be worth first checking to see if s1 is in s2 before calling find() .在调用find()之前首先检查 s1 是否在 s2 中可能是值得的。
This can be more efficient if you know that most of the times s1 won't be a substring of s2如果您知道大多数时候 s1 不会是 s2 的子字符串,这会更有效

Since the in operator is very efficient由于in运算符非常有效

 s1 in s2

It can be more efficient to convert:转换可能更有效:

index = s2.find(s1)

to

index = -1
if s1 in s2:
   index = s2.find(s1)

This is useful for when find() is going to be returning -1 a lot.这对于find()将大量返回 -1 时很有用。

I found it substantially faster since find() was being called many times in my algorithm, so I thought it was worth mentioning我发现它的速度要快得多,因为在我的算法中多次调用find() ,所以我认为值得一提

late to the party, was searching for same, as "in" is not valid, I had just created following.聚会迟到,正在搜索相同的内容,因为“in”无效,我刚刚创建了以下内容。

def find_str(full, sub):
    index = 0
    sub_index = 0
    position = -1
    for ch_i,ch_f in enumerate(full) :
        if ch_f.lower() != sub[sub_index].lower():
            position = -1
            sub_index = 0
        if ch_f.lower() == sub[sub_index].lower():
            if sub_index == 0 :
                position = ch_i

            if (len(sub) - 1) <= sub_index :
                break
            else:
                sub_index += 1

    return position

print(find_str("Happy birthday", "py"))
print(find_str("Happy birthday", "rth"))
print(find_str("Happy birthday", "rh"))

which produces产生

3
8
-1

remove lower() in case case insensitive find not needed.删除 lower() 以防不区分大小写的 find 不需要。

Here is a simple approach:这是一个简单的方法:

my_string = 'abcdefg'
print(text.find('def'))

Output:输出:

3 3

I the substring is not there, you will get -1 .我的子串不存在,你会得到-1 For example:例如:

my_string = 'abcdefg'
print(text.find('xyz'))

Output:输出:

-1 -1

Sometimes, you might want to throw exception if substring is not there:有时,如果子字符串不存在,您可能想抛出异常:

my_string = 'abcdefg'
print(text.index('xyz')) # It returns an index only if it's present

Output:输出:

Traceback (most recent call last):回溯(最近一次调用最后一次):

File "test.py", line 6, in print(text.index('xyz'))文件“test.py”,第 6 行,在print(text.index('xyz'))

ValueError: substring not found值错误:未找到子字符串

Not directly answering the question but I got a similar question recently where I was asked to count the number of times a sub-string is repeated in a given string.没有直接回答这个问题,但我最近收到了一个类似的问题,我被要求计算一个子字符串在给定字符串中重复的次数。 Here is the function I wrote:这是我写的函数:

def count_substring(string, sub_string):
    cnt = 0
    len_ss = len(sub_string)
    for i in range(len(string) - len_ss + 1):
        if string[i:i+len_ss] == sub_string:
            cnt += 1
    return cnt

The find() function probably returns the index of the fist occurrence only. find() 函数可能只返回第一次出现的索引。 Storing the index in place of just counting, can give us the distinct set of indices the sub-string gets repeated within the string.存储索引而不是仅仅计数,可以为我们提供子字符串在字符串中重复的不同索引集。

Disclaimer: I am 'extremly' new to Python programming.免责声明:我对 Python 编程“非常”陌生。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM