解析HTML时遇到问题

Question

I looked around and found a few examples of how to split text in python but having problems on my example. 我环顾四周，发现了一些有关如何在python中分割文本的示例，但我的示例存在问题。 Here's what I want to parse: 这是我要解析的内容：

<img alt="" src="http://example.com/servlet/charting?base_color=grey&amp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple">

Here's what I tried: 这是我尝试过的：

dict(kvpair.split('=') for kvpair in variableIwantToParse.split('&'))

I get the error "ValueError: dictionary update sequence element #0 has length 5; 2 is required" 我收到错误“ ValueError：字典更新序列元素＃0的长度为5； 2为必需”

I tried also to use variableIwantToParse.strip('&') but when I tried to print variableIwantToParse it only displaced one letter at a time. 我也尝试使用variableIwantToParse.strip（'＆'），但是当我尝试打印variableIwantToParse时，一次只替换了一个字母。

I'm sure this is easy but can't seem to figure out how to parse it. 我敢肯定这很简单，但似乎无法弄清楚如何解析它。 I basically want 10.21,12.12,43.12,12.10 to be associated with blue,red,green,purple (in the order displayed) 我基本上希望将10.21,12.12,43.12,12.10与蓝色，红色，绿色，紫色关联（按显示的顺序）

Thanks very much for your help(and sorry if this is too easy..I just can't for the life of me figure out the command to parse this) :-) 非常感谢您的帮助（对不起，如果这样做太简单了，我一辈子都无法找出解析该命令的方法）：-）

Answer 1

Use the built-in urlparse module , do not do these splits yourself. 使用内置的urlparse模块，不要自己进行这些拆分。

>>> import urlparse
>>> url_to_parse = "http://example.com/servlet/charting?base_color=grey&amp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple"
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = urlparse.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

If you're using Python with a version less than 2.6, then you have to import the cgi module . 如果您使用的Python版本低于2.6，则必须导入cgi模块。 Do this instead: 改为这样做：

>>> import urlparse
>>> import cgi
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = cgi.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

Then to associate them to a dictionary, use the provided dict constructor alongside zip . 然后将它们与字典相关联，请在zip旁边使用提供的dict构造函数。

>>> print dict(zip( query_as_dict['static_labels'][0].split(','), query_as_dict['static_xvalues'][0].split(',')))
{'blue': '10.21', 'purple': '12.10', 'green': '43.12', 'red': '12.12'}

Answer 2

square brackets: 方括号：

dict([kvpair.split('=') for kvpair in variableIwantToParse.split('&')])

also, replacing & with & 同样，用＆amp;代替＆ could help. 有帮助。

Answer 3

This will get you what you want: 这将为您提供所需的东西：

d = dict(kv.split('=') for kv in string_to_parse.split('?')[1][:-2].split('&amp;'))
labels_and_values = zip(d['static_labels'].split(','), d['static_xvalues'].split(','))

It can be really useful to break down things in the command prompt when you run into trouble. 遇到麻烦时，在命令提示符下分解内容可能非常有用。 For example: 例如：

10 > for kv in s.split('&'):
...:     print kv.split('=')

If you check it out you'll see splitting on & was causing you issues (feeding dict too many values for one item in the list). 如果您将其签出，则会看到＆分裂并导致问题（为列表中的一项提供dict值过多）。

解析HTML时遇到问题

问题描述

3 个解决方案

解决方案1
7 已采纳 2011-03-14 08:41:40

解决方案2
0 2011-03-14 08:44:52

解决方案3
0 2011-03-14 08:59:42

解析HTML时遇到问题

问题描述

3 个解决方案

解决方案1 7 已采纳 2011-03-14 08:41:40

解决方案2 0 2011-03-14 08:44:52

解决方案3 0 2011-03-14 08:59:42

解决方案1
7 已采纳 2011-03-14 08:41:40

解决方案2
0 2011-03-14 08:44:52

解决方案3
0 2011-03-14 08:59:42