简体   繁体   English

解析HTML时遇到问题

[英]Having trouble parsing HTML

I looked around and found a few examples of how to split text in python but having problems on my example. 我环顾四周,发现了一些有关如何在python中分割文本的示例,但我的示例存在问题。 Here's what I want to parse: 这是我要解析的内容:

<img alt="" src="http://example.com/servlet/charting?base_color=grey&amp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple">

Here's what I tried: 这是我尝试过的:

dict(kvpair.split('=') for kvpair in variableIwantToParse.split('&'))

I get the error "ValueError: dictionary update sequence element #0 has length 5; 2 is required" 我收到错误“ ValueError:字典更新序列元素#0的长度为5; 2为必需”

I tried also to use variableIwantToParse.strip('&') but when I tried to print variableIwantToParse it only displaced one letter at a time. 我也尝试使用variableIwantToParse.strip('&'),但是当我尝试打印variableIwantToParse时,一次只替换了一个字母。

I'm sure this is easy but can't seem to figure out how to parse it. 我敢肯定这很简单,但似乎无法弄清楚如何解析它。 I basically want 10.21,12.12,43.12,12.10 to be associated with blue,red,green,purple (in the order displayed) 我基本上希望将10.21,12.12,43.12,12.10与蓝色,红色,绿色,紫色关联(按显示的顺序)

Thanks very much for your help(and sorry if this is too easy..I just can't for the life of me figure out the command to parse this) :-) 非常感谢您的帮助(对不起,如果这样做太简单了,我一辈子都无法找出解析该命令的方法):-)

Use the built-in urlparse module , do not do these splits yourself. 使用内置的urlparse模块 ,不要自己进行这些拆分。

>>> import urlparse
>>> url_to_parse = "http://example.com/servlet/charting?base_color=grey&amp;chart_width=288&amp;chart_height=160&amp;chart_type=png&amp;chart_style=manufund_pie&amp;3DSet=true&amp;chart_size=small&amp;leg_on=left&amp;static_xvalues=10.21,12.12,43.12,12.10,&amp;static_labels=blue,red,green,purple"
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = urlparse.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

If you're using Python with a version less than 2.6, then you have to import the cgi module . 如果您使用的Python版本低于2.6,则必须导入cgi模块 Do this instead: 改为这样做:

>>> import urlparse
>>> import cgi
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = cgi.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}

Then to associate them to a dictionary, use the provided dict constructor alongside zip . 然后将它们与字典相关联,请在zip旁边使用提供的dict构造函数。

>>> print dict(zip( query_as_dict['static_labels'][0].split(','), query_as_dict['static_xvalues'][0].split(',')))
{'blue': '10.21', 'purple': '12.10', 'green': '43.12', 'red': '12.12'}

square brackets: 方括号:

dict([kvpair.split('=') for kvpair in variableIwantToParse.split('&')])

also, replacing & with &amp; 同样,用&amp;代替& could help. 有帮助。

This will get you what you want: 这将为您提供所需的东西:

d = dict(kv.split('=') for kv in string_to_parse.split('?')[1][:-2].split('&amp;'))
labels_and_values = zip(d['static_labels'].split(','), d['static_xvalues'].split(','))

It can be really useful to break down things in the command prompt when you run into trouble. 遇到麻烦时,在命令提示符下分解内容可能非常有用。 For example: 例如:

10 > for kv in s.split('&'):
...:     print kv.split('=')

If you check it out you'll see splitting on & was causing you issues (feeding dict too many values for one item in the list). 如果您将其签出,则会看到&分裂并导致问题(为列表中的一项提供dict值过多)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM