简体   繁体   中英

Python RegEx extract text between two patterns

I am trying to pull out values for lat and lng for the following:

coordinates = 
[<div class="store-map">\n<div id="map" style="width: 100%; height: 400px;"></div>\n<script>\r\n                function initMap() {\r\n                    var myLatLng = {\r\n                        lat: 42.050994,\r\n                        lng: -88.077711                    };\r\n\r\n     

However, when I apply this regex -

found = re.search('lat:(.*),', coordinates,).group(1)  

Everything after "lat:" is returned.
However, the desired result is just the number, that stops as soon as it reaches the comma. This is odd to me, because even rubular shows that code should work. Any ideas on what I could be doing wrong here?

PS I have spent a bit of time, and looked at all related solutions on stackoverflow, however - no dice.

The right way with re.findall function:

import re

coordinates = '[<div class="store-map">\n<div id="map" style="width: 100%; height: 400px;"></div>\n<script>\r\n                function initMap() {\r\n                    var myLatLng = {\r\n                        lat: 42.050994,\r\n                        lng: -88.077711                    };\r\n\r\n '
result = re.findall(r'\b(?:lat|lng): -?\d+\.\d+', coordinates)

print(result)

The output:

['lat: 42.050994', 'lng: -88.077711']

Use the following to extract the two values:

import re

text = """[<div class="store-map">\n<div id="map" style="width: 100%; height: 400px;"></div>\n<script>\r\n                function initMap() {\r\n                    var myLatLng = {\r\n                        lat: 42.050994,\r\n                        lng: -88.077711                    };\r\n\r\n     """

lat, lng = map(float, re.findall(r'(?:lat|lng):\s+([0-9.-]*?)[, ]', text))
print lat, lng

Giving you two floats as:

42.050994 -88.077711

This is because .* is greedy meaning it would match everything up to the last comma. Change it to .*? :

lat:(.*?),
       ^
   add this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM