简体   繁体   English

如何在模式后提取 substring

[英]How do you extract substring after a pattern

I'm fairly new to Python.我对 Python 还很陌生。 I would like to know the best way to extract a substring after a certain pattern.我想知道在某种模式之后提取 substring 的最佳方法。 The pattern is the following Prefix - Postfix .模式是以下Prefix - Postfix I would like to isolate the Postfix.我想隔离 Postfix。 I can guarantee that the Prefix will only contain letters, but I cannot guarantee its length.我可以保证前缀只会包含字母,但我不能保证它的长度。 On the other hand, the Postfix may have spaces and hyphens within it;另一方面,后缀中可能有空格和连字符; it can be any character whatsoever.它可以是任何字符。 I simply need to get rid of the Prefix - and keep the 'Postfix'我只需要摆脱Prefix -并保留“后缀”

"""
Example input:
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM

Desired result:
RVA-QA PK
VA - BN146
STP_NA
ZXU RMP LM
"""

What would be the best way to achieve this?实现这一目标的最佳方法是什么? I have the following code, but it doesn't quite to do what I want it to:我有以下代码,但它并没有完全按照我的意愿去做:

import sqlalchemy

url = 'mysql://scott:tiger@localhost/test'
engine = create_engine(url)
db = engine.connect()

# Construct Query
query = "SELECT name FROM items"

# Obtain table information
item_list = db.execute(query)

# Declare list that will hold the results
result_list = []

for item in item_list:
    result_list.append(item[0].rsplit('-', 1)[1].strip())

return result_list

Would you recommend I use regex?你会推荐我使用正则表达式吗? Or is there a better way?或者,还有更好的方法? Any advice or help is appreciated.任何建议或帮助表示赞赏。

Thank you谢谢

If you want to replace anything before "-"如果要替换“-”之前的任何内容

just try:试试看嘛:

import re
str = "example - postfix"
re.sub(".+-", "", str)

output: output:

"postfix"

I am using regex here.我在这里使用正则表达式。 You can also use str.split("-")[1]您也可以使用 str.split("-")[1]

I don't think you need to use regex since you simply want to extract the substring after the first appearance of a specific sequence of characters.我认为您不需要使用正则表达式,因为您只想在第一次出现特定字符序列后提取 substring 。

String.index() method returns the index of a substring inside the string (the first one, if there are more than one), so use this to find the location of the separator. String.index()方法返回字符串中 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ 的索引(第一个,如果有多个),因此使用它来查找分隔符的位置。 You can easily extract the postfix with string slicing afterward.之后,您可以使用字符串切片轻松提取后缀。

The code below should print Postfix .下面的代码应该打印Postfix

item = 'Prefix - Postfix'
separator = ' - '
start = item.index(separator) + len(separator)
print(item[start:])

Try this with your examples.用你的例子试试这个。 https://www.pythonpad.co/pads/edtnyn2hk6u4ns8h/ https://www.pythonpad.co/pads/edtnyn2hk6u4ns8h/

This was the best(shortest) regex I could come up with that returned what you wanted.这是我能想到的最好的(最短的)正则表达式,它返回了你想要的。 This answer hopefully deals with all the edge cases (etc. having dashes in your desired string).这个答案有望处理所有边缘情况(等。在您想要的字符串中有破折号)。 However, there are some spacing issues.但是,存在一些间距问题。

import re
the_str = """
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM
"""
reg = re.compile("\n.*?- ")
a = re.sub(reg,"\n",the_str)

print(a)

returns:返回:


RVA-QA PK
VA - BN146
STP_NA
ZXU RMP LM

The spacing is weird(due to multiline strings), but you could just.strip("\n") it away.间距很奇怪(由于多行字符串),但您可以将其删除。 A second regex would be第二个正则表达式是

import re
the_str = """
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM
"""
reg = re.compile("\n.*?- (.*)")
a = re.findall(reg,the_str)
print(a)

This returns an array of all the correct answers, without any spacing issues.这将返回所有正确答案的数组,没有任何间距问题。 Output: ['RVA-QA PK', 'VA - BN146', 'STP_NA', 'ZXU RMP LM'] Output: ['RVA-QA PK', 'VA - BN146', 'STP_NA', 'ZXU RMP LM']

Hope this helped!希望这有帮助!

You can use python split and strip function.您可以使用 python 拆分和剥离 function。 Split() returns an array of chunks. Split() 返回一个块数组。 For example, m_string = "I-have-got-an-example" result1 = m_string.split('-') 'result1' is ['I', 'have', 'got', 'an', 'example'] Only for using this one, you will have whitespaces, so you have to use strip() as well.例如, m_string = "I-have-got-an-example" result1 = m_string.split('-') 'result1' 是 ['I', 'have', 'got', 'an', 'example' ] 只有使用这个,你会有空格,所以你也必须使用 strip() 。

You can try this example.你可以试试这个例子。 `m_string = "I - have - got- an -example" result = [x.strip() for x in m_string.split('-')] `m_string = "I - have - got- an -example" result = [x.strip() for x in m_string.split('-')]

result is ["I", "have", "got", "an", "example"]结果是 ["I", "have", "got", "an", "example"]

` I hope this will be helpful for you. ` 我希望这对你有帮助。

The correction solution seems to be the following:更正解决方案似乎如下:

for item in item_list:
    result_list.append(item[0].split(' - ', 1)[1].strip())

Thanks for all the answers.感谢所有的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM