简体   繁体   English

Python-在其他两个特定字符之间的字符串中提取文本?

[英]Python - Extracting text in a string between two other specific characters?

I have various strings of text that have a user's name, their business name and phone number and they are all similar to the following: 我有各种各样的文本字符串,其中包含用户名,公司名称和电话号码,它们都类似于以下内容:

FirstName LastName (Some Business Name / phoneNumber)
FirstName LastName (Business Name / phoneNumber)
FirstName LastName (BusinessName / differentphoneNumber)
FirstName LastName (Short Name / somephoneNumber)
FirstName LastName (Very Long Business Name / otherphoneNumber)

Real world examples could look like this: 现实世界中的示例可能如下所示:

David Smith (Best Pool and Spa Supplies / 07438473784)
Bessy McCarthur Jone (Dog Supplies / 0438-343522)

I have used this code to extract the first name (as I needed this earlier) and it works well: 我已经使用此代码提取了名字(如我之前所需要的),并且效果很好:

import re
details = re.findall(r'^[\w+]+', input_data['stripeDescription'])
return {
'firstName': details[0] if details else None\``
}

How can I go about finding the text between the open bracket "(" and the forward slash "/" to then retrieve the business name? 我该如何找到左括号“(”和正斜杠“ /”之间的文本,然后检索公司名称?

This may not be a perfect solution but it works fine :) 这可能不是一个完美的解决方案,但效果很好:)

s1='David Smith (Best Pool and Spa Supplies / 07438473784)'
sp1=s1.split('(')
sp2=sp1[1].split('/')
print(sp2)

output: ['Best Pool and Spa Supplies ', ' 07438473784)'] 输出:['最佳泳池和水疗用品','07438473784)']

Use parentheses to group the pattern you want to match in the regex you use for re.findall : 使用括号将要匹配的模式分组到用于re.findall的正则表达式中:

s = '''David Smith (Best Pool and Spa Supplies / 07438473784)
Bessy McCarthur Jone (Dog Supplies / 0438-343522)'''
import re
print(re.findall(r'\(([^/]+?) */', s))

This outputs: 输出:

['Best Pool and Spa Supplies', 'Dog Supplies']

This is fairly robust, but will not handle a name with parentheses in it. 这是相当健壮的,但是不会处理带有括号的名称。 ie it expects the first ( to delimit past the name. However, you might be able know something is wrong by noting that the business then has \\).*\\( in it. 即,它期望第一个(分隔名称)。但是,通过注意该企业中有\\).*\\( ,您可能就能知道出了点问题\\).*\\(

data = """
David Smith (Best Pool and Spa Supplies / 07438473784)
David Smith2 (Best Pool/Spa Supplies / 07438473784)
Bessy McCarthur Jone (Dog Supplies / 0438-343522)
Bessy McCarthur Jone2 (Dog (and cat) Supplies / 0438-343522)
Bessy (Bess, fails) McCarthur Jone3 (Dog Supplies / 0438-343522)
"""

lines = [line.strip() for line in data.splitlines() if line.strip()]

for line in lines:
    name,rest = line.split("(",1)
    name = name.strip()
    phone = rest.rsplit("/")[1].replace(")","").strip()
    biz = rest.rsplit("/",1)[0].strip()
    print("\n "+line)
    print(" =>name:%s: phone:%s:biz:%s:" % (name, phone,biz))

output: 输出:

 David Smith (Best Pool and Spa Supplies / 07438473784)
 =>name:David Smith: phone:07438473784:biz:Best Pool and Spa Supplies:

 David Smith2 (Best Pool/Spa Supplies / 07438473784)
 =>name:David Smith2: phone:Spa Supplies:biz:Best Pool/Spa Supplies:

 Bessy McCarthur Jone (Dog Supplies / 0438-343522)
 =>name:Bessy McCarthur Jone: phone:0438-343522:biz:Dog Supplies:

 Bessy McCarthur Jone2 (Dog (and cat) Supplies / 0438-343522)
 =>name:Bessy McCarthur Jone2: phone:0438-343522:biz:Dog (and cat) Supplies:

 Bessy (Bess, fails) McCarthur Jone3 (Dog Supplies / 0438-343522)
 =>name:Bessy: phone:0438-343522:biz:Bess, fails) McCarthur Jone3 (Dog Supplies:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM