如何在特定子字符串之后獲取字符串？

Question

如何在特定子字符串之后獲取字符串？

例如，我想獲取"world"之后的字符串

my_string="hello python world, I'm a beginner"

...在這種情況下是： ", I'm a beginner" ）

Answer 1

最簡單的方法可能只是分割你的目標詞

my_string="hello python world , i'm a beginner"
print(my_string.split("world",1)[1])

split 接受要拆分的單詞（或字符），並且可以選擇限制拆分的數量。

在此示例中，在“世界”上拆分並將其限制為僅一次拆分。

Answer 2

我很驚訝沒有人提到partition 。

def substring_after(s, delim):
    return s.partition(delim)[2]

s1="hello python world, I'm a beginner"
substring_after(s1, "world")

# ", I'm a beginner"

恕我直言，這個解決方案比@arshajii 的更具可讀性。 除此之外，我認為@arshajii 是最快的——它不會創建任何不必要的副本/子字符串。

Answer 3

s1 = "hello python world , i'm a beginner"
s2 = "world"

print(s1[s1.index(s2) + len(s2):])

如果要處理s1中不存在s2的情況，請使用s1.find(s2)而不是index 。 如果該調用的返回值為-1 ，則s2不在s1中。

Answer 4

你想使用str.partition() ：

>>> my_string.partition("world")[2]
" , i'm a beginner "

因為這個選項比其他選項更快。

請注意，如果缺少分隔符，這會產生一個空字符串：

>>> my_string.partition("Monty")[2]  # delimiter missing
''

如果您想要原始字符串，請測試從str.partition()返回的第二個值是否非空：

prefix, success, result = my_string.partition(delimiter)
if not success: result = prefix

您還可以使用str.split()限制為 1：

>>> my_string.split("world", 1)[-1]
" , i'm a beginner "
>>> my_string.split("Monty", 1)[-1]  # delimiter missing
"hello python world , i'm a beginner "

但是，此選項較慢。 在最佳情況下，與 str.split() 相比， str.split() str.partition()很容易快 15% 左右：

                                missing        first         lower         upper          last
      str.partition(...)[2]:  [3.745 usec]  [0.434 usec]  [1.533 usec]  <3.543 usec>  [4.075 usec]
str.partition(...) and test:   3.793 usec    0.445 usec    1.597 usec    3.208 usec    4.170 usec
      str.split(..., 1)[-1]:  <3.817 usec>  <0.518 usec>  <1.632 usec>  [3.191 usec]  <4.173 usec>
            % best vs worst:         1.9%         16.2%          6.1%          9.9%          2.3%

這顯示了每次執行的時間，這里的輸入分隔符要么丟失（最壞情況），要么放在第一位（最好的情況），要么在下半部分、上半部分或最后一個位置。 最快的時間用[...]標記， <...>標記最差的時間。

上表是通過對所有三個選項的綜合計時試驗產生的，如下所示。 我在具有 2.9 GHz Intel Core i7 和 16 GB ram 的 2017 型號 15" Macbook Pro 上運行了 Python 3.7.4 測試。

這個腳本生成隨機句子，有和沒有隨機選擇的分隔符，如果存在，在生成的句子的不同位置，以隨機順序重復運行測試（產生最公平的結果，說明測試期間發生的隨機操作系統事件），然后打印結果表：

import random
from itertools import product
from operator import itemgetter
from pathlib import Path
from timeit import Timer

setup = "from __main__ import sentence as s, delimiter as d"
tests = {
    "str.partition(...)[2]": "r = s.partition(d)[2]",
    "str.partition(...) and test": (
        "prefix, success, result = s.partition(d)\n"
        "if not success: result = prefix"
    ),
    "str.split(..., 1)[-1]": "r = s.split(d, 1)[-1]",
}

placement = "missing first lower upper last".split()
delimiter_count = 3

wordfile = Path("/usr/dict/words")  # Linux
if not wordfile.exists():
    # macos
    wordfile = Path("/usr/share/dict/words")
words = [w.strip() for w in wordfile.open()]

def gen_sentence(delimiter, where="missing", l=1000):
    """Generate a random sentence of length l

    The delimiter is incorporated according to the value of where:

    "missing": no delimiter
    "first":   delimiter is the first word
    "lower":   delimiter is present in the first half
    "upper":   delimiter is present in the second half
    "last":    delimiter is the last word

    """
    possible = [w for w in words if delimiter not in w]
    sentence = random.choices(possible, k=l)
    half = l // 2
    if where == "first":
        # best case, at the start
        sentence[0] = delimiter
    elif where == "lower":
        # lower half
        sentence[random.randrange(1, half)] = delimiter
    elif where == "upper":
        sentence[random.randrange(half, l)] = delimiter
    elif where == "last":
        sentence[-1] = delimiter
    # else: worst case, no delimiter

    return " ".join(sentence)

delimiters = random.choices(words, k=delimiter_count)
timings = {}
sentences = [
    # where, delimiter, sentence
    (w, d, gen_sentence(d, w)) for d, w in product(delimiters, placement)
]
test_mix = [
    # label, test, where, delimiter sentence
    (*t, *s) for t, s in product(tests.items(), sentences)
]
random.shuffle(test_mix)

for i, (label, test, where, delimiter, sentence) in enumerate(test_mix, 1):
    print(f"\rRunning timed tests, {i:2d}/{len(test_mix)}", end="")
    t = Timer(test, setup)
    number, _ = t.autorange()
    results = t.repeat(5, number)
    # best time for this specific random sentence and placement
    timings.setdefault(
        label, {}
    ).setdefault(
        where, []
    ).append(min(dt / number for dt in results))

print()

scales = [(1.0, 'sec'), (0.001, 'msec'), (1e-06, 'usec'), (1e-09, 'nsec')]
width = max(map(len, timings))
rows = []
bestrow = dict.fromkeys(placement, (float("inf"), None))
worstrow = dict.fromkeys(placement, (float("-inf"), None))

for row, label in enumerate(tests):
    columns = []
    worst = float("-inf")
    for p in placement:
        timing = min(timings[label][p])
        if timing < bestrow[p][0]:
            bestrow[p] = (timing, row)
        if timing > worstrow[p][0]:
            worstrow[p] = (timing, row)
        worst = max(timing, worst)
        columns.append(timing)

    scale, unit = next((s, u) for s, u in scales if worst >= s)
    rows.append(
        [f"{label:>{width}}:", *(f" {c / scale:.3f} {unit} " for c in columns)]
    )

colwidth = max(len(c) for r in rows for c in r[1:])
print(' ' * (width + 1), *(p.center(colwidth) for p in placement), sep="  ")
for r, row in enumerate(rows):
    for c, p in enumerate(placement, 1):
        if bestrow[p][1] == r:
            row[c] = f"[{row[c][1:-1]}]"
        elif worstrow[p][1] == r:
            row[c] = f"<{row[c][1:-1]}>"
    print(*row, sep="  ")

percentages = []
for p in placement:
    best, worst = bestrow[p][0], worstrow[p][0]
    ratio = ((worst - best) / worst)
    percentages.append(f"{ratio:{colwidth - 1}.1%} ")

print("% best vs worst:".rjust(width + 1), *percentages, sep="  ")

Answer 5

如果你想使用正則表達式來做到這一點，你可以簡單地使用一個非捕獲組，得到“世界”這個詞，然后抓住一切，就像這樣

(?:world).*

示例字符串在此處進行測試

Answer 6

在 Python 3.9 中，添加了一個新的removeprefix方法：

>>> 'TestHook'.removeprefix('Test')
'Hook'
>>> 'BaseTestCase'.removeprefix('Test')
'BaseTestCase'

文檔： https ://docs.python.org/3.9/library/stdtypes.html#str.removeprefix
公告： https ://docs.python.org/3.9/whatsnew/3.9.html

Answer 7

這是一個老問題，但我遇到了一個非常相同的情況，我需要使用“低”這個詞作為分隔符來分割一個字符串，對我來說，問題是我在同一個字符串中有下面和更低的詞。

我以這種方式使用 re 模塊解決了它

import re

string = '...below...as higher prices mean lower demand to be expected. Generally, a high reading is seen as negative (or bearish), while a low reading is seen as positive (or bullish) for the Korean Won.'

# use re.split with regex to match the exact word
stringafterword = re.split('\\blow\\b',string)[-1]

print(stringafterword)
# ' reading is seen as positive (or bullish) for the Korean Won.'

# the generic code is:
re.split('\\bTHE_WORD_YOU_WANT\\b',string)[-1]

希望這可以幫助某人！

Answer 8

您可以使用名為substring的包。 只需使用命令pip install substring 。 您只需提及開始和結束字符/索引即可獲取子字符串。

例如：

import substring
s = substring.substringByChar("abcdefghijklmnop", startChar="d", endChar="n")
print(s)

輸出：

# s = defghijklmn

Answer 9

試試這種通用方法：

import re

my_string="hello python world , i'm a beginner"
p = re.compile("world(.*)")
print(p.findall(my_string))

# [" , i'm a beginner "]

如何在特定子字符串之后獲取字符串？

問題描述

9 個解決方案

解決方案1
545 已采納 2012-09-24 20:27:07

解決方案2
77 2013-05-23 11:35:10

解決方案3
75 2012-09-24 20:27:31

解決方案4
50 2019-07-16 19:24:29

解決方案5
21 2012-09-24 20:31:20

解決方案6
8 2020-06-10 15:21:25

解決方案7
6 2017-01-13 03:15:16

解決方案8
6 2018-06-26 04:11:07

解決方案9
6 2020-02-27 23:36:56

如何在特定子字符串之后獲取字符串？

問題描述

9 個解決方案

解決方案1 545 已采納 2012-09-24 20:27:07

解決方案2 77 2013-05-23 11:35:10

解決方案3 75 2012-09-24 20:27:31

解決方案4 50 2019-07-16 19:24:29

解決方案5 21 2012-09-24 20:31:20

解決方案6 8 2020-06-10 15:21:25

解決方案7 6 2017-01-13 03:15:16

解決方案8 6 2018-06-26 04:11:07

解決方案9 6 2020-02-27 23:36:56

解決方案1
545 已采納 2012-09-24 20:27:07

解決方案2
77 2013-05-23 11:35:10

解決方案3
75 2012-09-24 20:27:31

解決方案4
50 2019-07-16 19:24:29

解決方案5
21 2012-09-24 20:31:20

解決方案6
8 2020-06-10 15:21:25

解決方案7
6 2017-01-13 03:15:16

解決方案8
6 2018-06-26 04:11:07

解決方案9
6 2020-02-27 23:36:56