简体   繁体   English

在 Pandas Dataframe 中提取字符串中两个字符之间的子字符串

[英]Extracting Sub-string Between Two Characters in String in Pandas Dataframe

I have a column containing strings that are comprised of different words but always have a similar structure structure.我有一列包含由不同单词组成但始终具有相似结构结构的字符串。 Eg:例如:

2cm off ORDER AGAIN (191 1141)

I want to extract the sub-string that starts after the second space and ends at the space before the opening bracket/parenthesis.我想提取在第二个空格之后开始并在左括号/括号之前的空格结束的子字符串。 So in this example I want to extract ORDER AGAIN.所以在这个例子中,我想再次提取 ORDER。

Is this possible?这可能吗?

You can try the following:您可以尝试以下方法:

r"2cm off ORDER AGAIN (191 1141)".split(r"(")[0].split(" ", maxsplit=2)[-1].strip()
#Out[3]: 'ORDER AGAIN'

You could use str.extract here:你可以在这里使用str.extract

df["out"] = df["col"].str.extract(r'^\w+ \w+ (.*?)(?: \(|$)')

Note that this answer is robust even if the string doesn't have a (...) term at the end.请注意,即使字符串末尾没有(...)项,此答案也是可靠的。

Here is a demo showing that the regex logic is working.这是一个演示,显示正则表达式逻辑正在运行。

If the pattern of data is similar to what you have posted then I think the below code snippet should work for you:如果数据模式与您发布的内容相似,那么我认为下面的代码片段应该适合您:

import re
data = "2cm off ORDER AGAIN (191 1141)"

extr = re.match(r".*?\s.*?\s(.*)\s\(.*", data)       
if extr:
    print (extr.group(1))

You can try the following code你可以试试下面的代码

s = '2cm off ORDER AGAIN (191 1141)'
second_space = s.find(' ', s.find(' ') + 1)
openparenthesis = s.find('(')
substring = s[second_space : openparenthesis]
print(substring) #ORDER AGAIN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM