[英]Extracting Sub-string Between Two Characters in String in Pandas Dataframe
I have a column containing strings that are comprised of different words but always have a similar structure structure.我有一列包含由不同单词组成但始终具有相似结构结构的字符串。 Eg:例如:
2cm off ORDER AGAIN (191 1141)
I want to extract the sub-string that starts after the second space and ends at the space before the opening bracket/parenthesis.我想提取在第二个空格之后开始并在左括号/括号之前的空格结束的子字符串。 So in this example I want to extract ORDER AGAIN.所以在这个例子中,我想再次提取 ORDER。
Is this possible?这可能吗?
You can try the following:您可以尝试以下方法:
r"2cm off ORDER AGAIN (191 1141)".split(r"(")[0].split(" ", maxsplit=2)[-1].strip()
#Out[3]: 'ORDER AGAIN'
You could use str.extract
here:你可以在这里使用str.extract
:
df["out"] = df["col"].str.extract(r'^\w+ \w+ (.*?)(?: \(|$)')
Note that this answer is robust even if the string doesn't have a (...)
term at the end.请注意,即使字符串末尾没有(...)
项,此答案也是可靠的。
Here is a demo showing that the regex logic is working.这是一个演示,显示正则表达式逻辑正在运行。
If the pattern of data is similar to what you have posted then I think the below code snippet should work for you:如果数据模式与您发布的内容相似,那么我认为下面的代码片段应该适合您:
import re
data = "2cm off ORDER AGAIN (191 1141)"
extr = re.match(r".*?\s.*?\s(.*)\s\(.*", data)
if extr:
print (extr.group(1))
You can try the following code你可以试试下面的代码
s = '2cm off ORDER AGAIN (191 1141)'
second_space = s.find(' ', s.find(' ') + 1)
openparenthesis = s.find('(')
substring = s[second_space : openparenthesis]
print(substring) #ORDER AGAIN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.