So I have tried to extract only the address from this string, but I'm having troubles with it. This is how the string looks like:
1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States Phone: 9099725134 Fax: 9099065401
Web: http://www.aareninc.com
I want to extract only the text that comes before the word 'Phone'
, so only the address.
I've tried with strip('Phone')
and then take the first element of an array but it gives me the first letter of that string.
address = contacts.strip('Phone')
print(address[0])
Use split function, not strip.
address = contacts.split('Phone')
print(address[0])
This should work.
Considering you have something like this with you
st = '1040 S. Vintage Ave.Building A Ontario, CA 91761 United States Phone: 9099725134 Fax: 9099065401 Web: http://www.aareninc.com'
v = st.split("Phone"))
print(v[0])
This will work for Python3. If you are using Python2 you can avoid using parenthesis with the print statement.
正如@JonClements所说,解决方案是:
contacts.partition('Phone')[0]
For that task you might use so-called zero length assertion (positive lookahead in this case)
import re
text = '''1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States Phone: 9099725134 Fax: 9099065401
Web: http://www.aareninc.com'''
adress = re.findall('.*(?=Phone)',text,re.DOTALL)[0]
print(adress)
output
1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States
Note that it will cause error, if text
do not contain Phone
substring. Note re.DOTALL
flag, so .
also matches newline character ( /n
), without that flag output would be Unites States
.
I hope this works.
Tested on python 2.7
string = r"1040 S. Vintage Ave. Building A Ontario, CA 91761 United States Phone: 9099725134 Fax: 9099065401 Web: http://www.aareninc.com"
f = re.split(' (?=Phone:)', string)
print 'String before Phone:', f[0]
using regular expressions:
import re
re.split('(Phone)', strng)
['1040 S. Vintage Ave. Building A Ontario, CA 91761 United States ',
'Phone',
': 9099725134 Fax: 9099065401 Web: http://www.aareninc.com']
Suppose your string is defined as:
contacts = """1040 S. Vintage Ave.
Building A Ontario, CA 91761
United States Phone: 9099725134 Fax: 9099065401
Web: http://www.aareninc.com"""
contacts.split('Phone')[0]
or contacts.partition('Phone')[0]
must give you the same result.
You can initially split to get a list of string on both the sides of "Phone". Then you'd want to use strip to remove leading and trailing white-space.
contacts.split('Phone')[0].strip()
This works.
You can use re.search()
:
import re
adress = re.search(r'^(.+?)\sPhone', s, flags=re.MULTILINE | re.DOTALL)
print(adress.group(1))
# 1040 S. Vintage Ave.
# Building A Ontario, CA 91761
# United States
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.