在 Pandas Dataframe 中提取字符串中两个字符之间的子字符串

Question

I have a column containing strings that are comprised of different words but always have a similar structure structure.我有一列包含由不同单词组成但始终具有相似结构结构的字符串。 Eg:例如：

2cm off ORDER AGAIN (191 1141)

I want to extract the sub-string that starts after the second space and ends at the space before the opening bracket/parenthesis.我想提取在第二个空格之后开始并在左括号/括号之前的空格结束的子字符串。 So in this example I want to extract ORDER AGAIN.所以在这个例子中，我想再次提取 ORDER。

Is this possible?这可能吗？

Answer 1

You can try the following:您可以尝试以下方法：

r"2cm off ORDER AGAIN (191 1141)".split(r"(")[0].split(" ", maxsplit=2)[-1].strip()
#Out[3]: 'ORDER AGAIN'

Answer 2

You could use str.extract here:你可以在这里使用str.extract ：

df["out"] = df["col"].str.extract(r'^\w+ \w+ (.*?)(?: \(|$)')

Note that this answer is robust even if the string doesn't have a (...) term at the end.请注意，即使字符串末尾没有(...)项，此答案也是可靠的。

Here is a demo showing that the regex logic is working.这是一个演示，显示正则表达式逻辑正在运行。

Answer 3

If the pattern of data is similar to what you have posted then I think the below code snippet should work for you:如果数据模式与您发布的内容相似，那么我认为下面的代码片段应该适合您：

import re
data = "2cm off ORDER AGAIN (191 1141)"

extr = re.match(r".*?\s.*?\s(.*)\s\(.*", data)       
if extr:
    print (extr.group(1))

Answer 4

You can try the following code你可以试试下面的代码

s = '2cm off ORDER AGAIN (191 1141)'
second_space = s.find(' ', s.find(' ') + 1)
openparenthesis = s.find('(')
substring = s[second_space : openparenthesis]
print(substring) #ORDER AGAIN

在 Pandas Dataframe 中提取字符串中两个字符之间的子字符串

问题描述

4 个解决方案

解决方案1
1 2021-05-21 10:50:47

解决方案2
1 已采纳 2021-05-21 10:54:24

解决方案3
0 2021-05-21 11:04:35

解决方案4
0 2021-05-21 11:16:11

在 Pandas Dataframe 中提取字符串中两个字符之间的子字符串

问题描述

4 个解决方案

解决方案1 1 2021-05-21 10:50:47

解决方案2 1 已采纳 2021-05-21 10:54:24

解决方案3 0 2021-05-21 11:04:35

解决方案4 0 2021-05-21 11:16:11

解决方案1
1 2021-05-21 10:50:47

解决方案2
1 已采纳 2021-05-21 10:54:24

解决方案3
0 2021-05-21 11:04:35

解决方案4
0 2021-05-21 11:16:11