高效的html正则表达式解析

Question

I have a piece of Python code scrapping datapoints value from what seems to be a Javascript graph on a webpage. 我有一段Python代码从网页上似乎是Javascript图的位置抓取数据点值。 The data looks like: 数据如下：

...html/javascript...
{'y':765000,...,'x':1248040800000,...},
{'y':1020000,...,'x':1279144800000,...},
{'y':1105000,...,'x':1312754400000,...}
...html/javascript...

where the dots are plotting data I skipped. 点在绘制我跳过的数据。

To scrap the useful information - x/y datapoints coordinates - I used regex : 为了废弃有用的信息-x / y数据点坐标-我使用了regex ：

#first getting the raw x data
xData = re.findall("'x':\d+", htmlContent)
#now reading each value one by one
xData = [int(re.findall("\d+",x)[0]) for x in xData]

Same for the y values. y值相同。 I don't know if this terribly inefficient but it does not look pretty or very smart as a have many redundant calls to re.findall . 我不知道这是多么的低效，但它看起来并不漂亮或非常聪明，因为有很多重复调用re.findall 。 Is there a way to do it in one pass? 有一种方法可以一次完成吗？ One pass for x and one pass for y? x一次通过，y一次通过？

Answer 1

You can do it a little bit easier: 您可以轻松一点：

htmlContent = """
...html/javascript...
{'y':765000,...,'x':1248040800000,...},
{'y':1020000,...,'x':1279144800000,...},
{'y':1105000,...,'x':1312754400000,...}
...html/javascript...
"""
# Get the numbers
xData = [int(_) for _ in re.findall("'x':(\d+)", htmlContent)]
print xData

高效的html正则表达式解析

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-07-18 15:19:25

高效的html正则表达式解析

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-07-18 15:19:25

解决方案1
1 已采纳 2016-07-18 15:19:25