I have a string of locations
locations = 'Los Angeles California ,"Heliopolis, Central, Cairo, Egypt",Berlin Germany, Paris France," Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore'
Note that the location names are separated by commas. But for each name with commas in between, it is enclosed in double quotation marks. Also there are prefix/suffix white spaces to be stripped.
After extracting the names into a list, the result should be:
['Los Angeles California', 'Heliopolis, Central, Cairo, Egypt', 'Berlin Germany', 'Paris France', 'Cairo, Egypt', 'Dokki, Giza, Egypt', 'Singapore']
I have tried this and it is able to get the results. But I'm laughing at my work because it looks so cumbersome
import re
locations = 'Los Angeles California ,"Heliopolis, Central, Cairo, Egypt",Berlin Germany, Paris France," Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore'
lis1 = [e.strip() for e in re.findall('"(.*?)"', locations)]
temp = []
for strg in lis1:
temp.extend([x.strip() for x in strg.split(',')])
lis2 = [e.strip() for e in locations.split(',')]
for strg in lis2:
if strg.strip('"').strip() not in temp:
lis1.append(strg)
print(lis1)
So I'm reaching out to the community... Is there a better solution using Regex or any other methods?
locations = 'Los Angeles California ,"Heliopolis, Central, Cairo, Egypt",Berlin Germany, Paris France," Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore'
locations = locations.strip(',')
locations=locations.split('"')
result=[]
for i in locations:
i = i.strip()
i = i.rstrip(',')
i = i.lstrip(',')
if i=="":
continue
else:
result.append(i)
print([e.strip() for e in result])
Output
['Los Angeles California',
'Heliopolis, Central, Cairo, Egypt',
'Berlin Germany, Paris France',
'Cairo, Egypt',
'Dokki, Giza, Egypt',
'Singapore']
[l.strip() for l in locations.split(",")]
Try this (this doesn't use regex)
locations = 'Los Angeles California ,"Heliopolis, Central, Cairo, Egypt",Berlin Germany, " Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore'
in_string = False
out = ['']
for char in locations:
if char == '"':
in_string = not in_string
continue
if char == ',':
if not in_string:
out.append('')
continue
out[-1] += char
print([x.strip() for x in out])
Output:
['Los Angeles California', 'Heliopolis, Central, Cairo, Egypt', 'Berlin Germany', 'Cairo, Egypt', 'Dokki, Giza, Egypt', 'Singapore']
I have tried in javascript to resolve this issue. There is another possible solution:
Javascript:
locations = 'Los Angeles California ,"Heliopolis, Cairo, Egypt",Berlin Germany, " Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore'
locations.match(/\"?([\w, ]+\"?)/gi).map(x => x = x.replace(/\"/gi,'').trim().replace(/(^\,|\,$)/g, '').replace(/\s+/g, ' ').trim()).filter(x => x)
Output:
[
'Los Angeles California ',
'Heliopolis, Cairo, Egypt',
'Berlin Germany',
'Cairo, Egypt',
'Dokki, Giza, Egypt',
'Singapore'
]
In Python:
import re
locations = 'Los Angeles California ,"Heliopolis, Central, Cairo, Egypt",Berlin Germany, Paris France," Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore'
x = re.findall("\"?([\w, ]+)\"?", locations)
print ([e.strip().strip(',').strip() for e in x if len(e)>5])
Output:
[
'Los Angeles California ',
'Heliopolis, Cairo, Egypt',
'Berlin Germany',
'Cairo, Egypt',
'Dokki, Giza, Egypt',
'Singapore'
]
Here's another way to solve it
locations = 'Los Angeles California ,"Heliopolis, Central, Cairo, Egypt",Berlin Germany, Paris France," Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore'
lis1 = [e.strip() for e in re.findall('"(.*?)"', locations)]
templis = ''.join(re.split('".*?"', locations))
lis2 = [e.strip() for e in templis.split(',') len(e.strip()) > 0]
print(lis1 + lis2)
['Heliopolis, Central, Cairo, Egypt',
'Cairo, Egypt',
'Dokki, Giza, Egypt',
'Los Angeles California',
'Berlin Germany',
'Paris France',
'Singapore']
Today I had retried and finally, I did that and got an answer in a single line.
In Javascript:
locations = `Los Angeles California ,"Heliopolis, Central, Cairo, Egypt",Berlin Germany, Paris France," Cairo, Egypt " , "Dokki, Giza, Egypt " , Singapore, "Kolkata, India", Nepal, Bhutan`;
locations.replace(/\"[\w\s, ]+\"/gi, x => x.replace(/,/g, '\\').replace(/\"/g, '').trim()).split(',').map(x => x.replace(/\\/g, ',').trim())
Output:
[
"Los Angeles California",
"Heliopolis, Central, Cairo, Egypt",
"Berlin Germany",
"Paris France",
"Cairo, Egypt",
"Dokki, Giza, Egypt",
"Singapore",
"Kolkata, India",
"Nepal",
"Bhutan"
]
Explanation:
\" (double inverted commas)
.
commas (,)
with Backslash (\)
: I am using backslash because it's not used in Location generally.\" (double inverted commas)
comma (,)
and replace Backslash (\)
with comma (,)
I am able to write that in python.
str.replace(find_st, x => x.replace(find_st1, rep_st))
Because how I don't know how I express the above expression in this in Python. Basically the inner function.
Can anyone help to write the above regular expression in Python in a single line?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.