如何使用正则表达式提取信息页面

Question

I'm having trouble capturing the contents of "name": he often appears before "pluralName" other later. 我在捕获“名称”的内容时遇到了麻烦：他经常出现在“ pluralName”之前。 What better way of doing this? 有什么更好的方法？ (best way in terms of performance). （就性能而言最好的方法）。 Thank you for your help! 谢谢您的帮助！

Note: I am using python 注意：我正在使用python

The chunk of the page that has the information I need: 页面中包含我需要的信息的块：

{"count":0,"items":[]},"shortUrl":"http:\/\/4sq.com\/11nP13T","likes":{"count":22,"groups":[{"type":"others","count":22,"items":[]}],"summary":"22 Likes"},"ratingColor":"FF9600","id":"5172311be4b0ecc0a12a9953","canonicalPath":"\/v\/kee-hiong-klang-bak-kut-teh\/5172311be4b0ecc0a12a9953","canonicalUrl":"https:\/\/foursquare.com\/v\/kee-hiong-klang-bak-kut-teh\/5172311be4b0ecc0a12a9953","rating":5.3,"categories":[**{"pluralName":"Chinese Restaurants","name":"Chinese Restaurant",**"icon":{"prefix":"https:\/\/ss3.4sqi.net\/img\/categories_v2\/food\/asian_","mapPrefix":"https:\/\/ss3.4sqi.net\/img\/categories_map\/food\/chinese","suffix":".png"},"id":"4bf58dd8d48988d145941735","shortName":"Chinese","primary":true},{"pluralName":"Asian Restaurants","name":"Asian Restaurant","icon":{"prefix":"https:\/\/ss3.4sqi.net\/img\/categories_v2\/food\/asian_","mapPrefix":"https:\/\/ss3.4sqi.net\/img\/categories_map\/food\/asian","suffix":".png"},"id":"4bf58dd8d48988d142941735","shortName":"Asian"}],"createdAt":1366438171,"tips":{"count":25,"groups":[{"count":25,"items":[{"logView":true,"text":"Portion is quite small and expensive. Service attitude is so so. The BKT taste is not my preference.One of the up car restaurants in SS2 which I'll never go back again. ðŸ‘Ž","likes":{"count":1,"groups":[{"type":"others","count":1,"items":[{"photo":{"prefix":"https:\/\/irs0.4sqi.net\/img\/user\/","suffix":"\/43964080-5LYADRF2EEP2RWPL.jpg"},"lastName":".w","firstName":"Jackie","id":"43964080","canonicalPath":"\/user\/43964080","canonicalUrl":"https:\/\/foursquare.com\/user\/43964080","gender":"female"}]}],"summary":"1 like"},"id":"541c2b73498eb0cfe1f76b9e","canonicalPath":"\/item\/541c2b73498eb0cfe1f76b9e","canonicalUrl":"https:\/\/foursquare.com\/item\/541c2b73498eb0cfe1f76b9e","createdAt":1.411132275E9,"todo":{"count":0},"user":{"photo":{"prefix":"https:\/\/irs1.4sqi.net\/img\/user\/","suffix":"\/5765949-NW4BAJWFBCVLRR1M.jpg"}

Answer 1

(?:"pluralName":"[^"]*","name":"([^"]*))|(?:"name":"([^"]*)","pluralName")

Try this with re.findall .See demo. 与re.findall一起尝试。请re.findall演示。

https://regex101.com/r/hR7tH4/4 https://regex101.com/r/hR7tH4/4

print re.findall(r'(?:"pluralName":"[^"]*","name":"([^"]*))|(?:"name":"([^"]*)","pluralName")',test_str)

Answer 2

Don't use a regexp at all. 根本不要使用正则表达式。

Instead, use a JSON parser, and access the resulting object. 而是使用JSON解析器，并访问生成的对象。 That is much more robust. 那更健壮。

import json # part of python
o = json.loads(str)

如何使用正则表达式提取信息页面

问题描述

2 个解决方案

解决方案1
1 2015-07-10 04:55:37

解决方案2
1 2015-07-10 05:31:43

如何使用正则表达式提取信息页面

问题描述

2 个解决方案

解决方案1 1 2015-07-10 04:55:37

解决方案2 1 2015-07-10 05:31:43

解决方案1
1 2015-07-10 04:55:37

解决方案2
1 2015-07-10 05:31:43