简体   繁体   English

如何在特定子字符串之后获取字符串?

[英]How to get a string after a specific substring?

How can I get a string after a specific substring?如何在特定子字符串之后获取字符串?

For example, I want to get the string after "world" in例如,我想获取"world"之后的字符串

my_string="hello python world, I'm a beginner"

...which in this case is: ", I'm a beginner" ) ...在这种情况下是: ", I'm a beginner"

The easiest way is probably just to split on your target word最简单的方法可能只是分割你的目标词

my_string="hello python world , i'm a beginner"
print(my_string.split("world",1)[1])

split takes the word (or character) to split on and optionally a limit to the number of splits. split 接受要拆分的单词(或字符),并且可以选择限制拆分的数量。

In this example, split on "world" and limit it to only one split.在此示例中,在“世界”上拆分并将其限制为仅一次拆分。

I'm surprised nobody mentioned partition .我很惊讶没有人提到partition

def substring_after(s, delim):
    return s.partition(delim)[2]

s1="hello python world, I'm a beginner"
substring_after(s1, "world")

# ", I'm a beginner"

IMHO, this solution is more readable than @arshajii's.恕我直言,这个解决方案比@arshajii 的更具可读性。 Other than that, I think @arshajii's is the best for being the fastest -- it does not create any unnecessary copies/substrings.除此之外,我认为@arshajii 是最快的——它不会创建任何不必要的副本/子字符串。

s1 = "hello python world , i'm a beginner"
s2 = "world"

print(s1[s1.index(s2) + len(s2):])

If you want to deal with the case where s2 is not present in s1 , then use s1.find(s2) as opposed to index .如果要处理s1存在s2的情况,请使用s1.find(s2)而不是index If the return value of that call is -1 , then s2 is not in s1 .如果该调用的返回值为-1 ,则s2不在s1中。

You want to use str.partition() :你想使用str.partition()

>>> my_string.partition("world")[2]
" , i'm a beginner "

because this option is faster than the alternatives .因为这个选项比其他选项更快

Note that this produces an empty string if the delimiter is missing:请注意,如果缺少分隔符,这会产生一个空字符串:

>>> my_string.partition("Monty")[2]  # delimiter missing
''

If you want to have the original string, then test if the second value returned from str.partition() is non-empty:如果您想要原始字符串,请测试从str.partition()返回的第二个值是否非空:

prefix, success, result = my_string.partition(delimiter)
if not success: result = prefix

You could also usestr.split() with a limit of 1:您还可以使用str.split()限制为 1:

>>> my_string.split("world", 1)[-1]
" , i'm a beginner "
>>> my_string.split("Monty", 1)[-1]  # delimiter missing
"hello python world , i'm a beginner "

However, this option is slower .但是,此选项较慢 For a best-case scenario, str.partition() is easily about 15% faster compared to str.split() :在最佳情况下,与 str.split() 相比, str.split() str.partition()很容易快 15% 左右

                                missing        first         lower         upper          last
      str.partition(...)[2]:  [3.745 usec]  [0.434 usec]  [1.533 usec]  <3.543 usec>  [4.075 usec]
str.partition(...) and test:   3.793 usec    0.445 usec    1.597 usec    3.208 usec    4.170 usec
      str.split(..., 1)[-1]:  <3.817 usec>  <0.518 usec>  <1.632 usec>  [3.191 usec]  <4.173 usec>
            % best vs worst:         1.9%         16.2%          6.1%          9.9%          2.3%

This shows timings per execution with inputs here the delimiter is either missing (worst-case scenario), placed first (best case scenario), or in the lower half, upper half or last position.这显示了每次执行的时间,这里的输入分隔符要么丢失(最坏情况),要么放在第一位(最好的情况),要么在下半部分、上半部分或最后一个位置。 The fastest time is marked with [...] and <...> marks the worst.最快的时间用[...]标记, <...>标记最差的时间。

The above table is produced by a comprehensive time trial for all three options, produced below.上表是通过对所有三个选项的综合计时试验产生的,如下所示。 I ran the tests on Python 3.7.4 on a 2017 model 15" Macbook Pro with 2.9 GHz Intel Core i7 and 16 GB ram.我在具有 2.9 GHz Intel Core i7 和 16 GB ram 的 2017 型号 15" Macbook Pro 上运行了 Python 3.7.4 测试。

This script generates random sentences with and without the randomly selected delimiter present, and if present, at different positions in the generated sentence, runs the tests in random order with repeats (producing the fairest results accounting for random OS events taking place during testing), and then prints a table of the results:这个脚本生成随机句子,有和没有随机选择的分隔符,如果存在,在生成的句子的不同位置,以随机顺序重复运行测试(产生最公平的结果,说明测试期间发生的随机操作系统事件),然后打印结果表:

import random
from itertools import product
from operator import itemgetter
from pathlib import Path
from timeit import Timer

setup = "from __main__ import sentence as s, delimiter as d"
tests = {
    "str.partition(...)[2]": "r = s.partition(d)[2]",
    "str.partition(...) and test": (
        "prefix, success, result = s.partition(d)\n"
        "if not success: result = prefix"
    ),
    "str.split(..., 1)[-1]": "r = s.split(d, 1)[-1]",
}

placement = "missing first lower upper last".split()
delimiter_count = 3

wordfile = Path("/usr/dict/words")  # Linux
if not wordfile.exists():
    # macos
    wordfile = Path("/usr/share/dict/words")
words = [w.strip() for w in wordfile.open()]

def gen_sentence(delimiter, where="missing", l=1000):
    """Generate a random sentence of length l

    The delimiter is incorporated according to the value of where:

    "missing": no delimiter
    "first":   delimiter is the first word
    "lower":   delimiter is present in the first half
    "upper":   delimiter is present in the second half
    "last":    delimiter is the last word

    """
    possible = [w for w in words if delimiter not in w]
    sentence = random.choices(possible, k=l)
    half = l // 2
    if where == "first":
        # best case, at the start
        sentence[0] = delimiter
    elif where == "lower":
        # lower half
        sentence[random.randrange(1, half)] = delimiter
    elif where == "upper":
        sentence[random.randrange(half, l)] = delimiter
    elif where == "last":
        sentence[-1] = delimiter
    # else: worst case, no delimiter

    return " ".join(sentence)

delimiters = random.choices(words, k=delimiter_count)
timings = {}
sentences = [
    # where, delimiter, sentence
    (w, d, gen_sentence(d, w)) for d, w in product(delimiters, placement)
]
test_mix = [
    # label, test, where, delimiter sentence
    (*t, *s) for t, s in product(tests.items(), sentences)
]
random.shuffle(test_mix)

for i, (label, test, where, delimiter, sentence) in enumerate(test_mix, 1):
    print(f"\rRunning timed tests, {i:2d}/{len(test_mix)}", end="")
    t = Timer(test, setup)
    number, _ = t.autorange()
    results = t.repeat(5, number)
    # best time for this specific random sentence and placement
    timings.setdefault(
        label, {}
    ).setdefault(
        where, []
    ).append(min(dt / number for dt in results))

print()

scales = [(1.0, 'sec'), (0.001, 'msec'), (1e-06, 'usec'), (1e-09, 'nsec')]
width = max(map(len, timings))
rows = []
bestrow = dict.fromkeys(placement, (float("inf"), None))
worstrow = dict.fromkeys(placement, (float("-inf"), None))

for row, label in enumerate(tests):
    columns = []
    worst = float("-inf")
    for p in placement:
        timing = min(timings[label][p])
        if timing < bestrow[p][0]:
            bestrow[p] = (timing, row)
        if timing > worstrow[p][0]:
            worstrow[p] = (timing, row)
        worst = max(timing, worst)
        columns.append(timing)

    scale, unit = next((s, u) for s, u in scales if worst >= s)
    rows.append(
        [f"{label:>{width}}:", *(f" {c / scale:.3f} {unit} " for c in columns)]
    )

colwidth = max(len(c) for r in rows for c in r[1:])
print(' ' * (width + 1), *(p.center(colwidth) for p in placement), sep="  ")
for r, row in enumerate(rows):
    for c, p in enumerate(placement, 1):
        if bestrow[p][1] == r:
            row[c] = f"[{row[c][1:-1]}]"
        elif worstrow[p][1] == r:
            row[c] = f"<{row[c][1:-1]}>"
    print(*row, sep="  ")

percentages = []
for p in placement:
    best, worst = bestrow[p][0], worstrow[p][0]
    ratio = ((worst - best) / worst)
    percentages.append(f"{ratio:{colwidth - 1}.1%} ")

print("% best vs worst:".rjust(width + 1), *percentages, sep="  ")

If you want to do this using regex, you could simply use a non-capturing group , to get the word "world" and then grab everything after, like so如果你想使用正则表达式来做到这一点,你可以简单地使用一个非捕获组,得到“世界”这个词,然后抓住一切,就像这样

(?:world).*

The example string is tested here示例字符串在此处进行测试

In Python 3.9, a new removeprefix method is being added:在 Python 3.9 中,添加了一个新的removeprefix方法:

>>> 'TestHook'.removeprefix('Test')
'Hook'
>>> 'BaseTestCase'.removeprefix('Test')
'BaseTestCase'

It's an old question but i faced a very same scenario, i need to split a string using as demiliter the word "low" the problem for me was that i have in the same string the word below and lower.这是一个老问题,但我遇到了一个非常相同的情况,我需要使用“低”这个词作为分隔符来分割一个字符串,对我来说,问题是我在同一个字符串中有下面和更低的词。

I solved it using the re module this way我以这种方式使用 re 模块解决了它

import re

string = '...below...as higher prices mean lower demand to be expected. Generally, a high reading is seen as negative (or bearish), while a low reading is seen as positive (or bullish) for the Korean Won.'

# use re.split with regex to match the exact word
stringafterword = re.split('\\blow\\b',string)[-1]

print(stringafterword)
# ' reading is seen as positive (or bullish) for the Korean Won.'

# the generic code is:
re.split('\\bTHE_WORD_YOU_WANT\\b',string)[-1]

Hope this can help someone!希望这可以帮助某人!

You can use the package called substring .您可以使用名为substring的包。 Just install using the command pip install substring .只需使用命令pip install substring You can get the substring by just mentioning the start and end characters/indices.您只需提及开始和结束字符/索引即可获取子字符串。

For example:例如:

import substring
s = substring.substringByChar("abcdefghijklmnop", startChar="d", endChar="n")
print(s)

Output:输出:

# s = defghijklmn

Try this general approach:试试这种通用方法:

import re

my_string="hello python world , i'm a beginner"
p = re.compile("world(.*)")
print(p.findall(my_string))

# [" , i'm a beginner "]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM