[英]Having trouble parsing HTML
I looked around and found a few examples of how to split text in python but having problems on my example. 我环顾四周,发现了一些有关如何在python中分割文本的示例,但我的示例存在问题。 Here's what I want to parse:
这是我要解析的内容:
<img alt="" src="http://example.com/servlet/charting?base_color=grey&chart_width=288&chart_height=160&chart_type=png&chart_style=manufund_pie&3DSet=true&chart_size=small&leg_on=left&static_xvalues=10.21,12.12,43.12,12.10,&static_labels=blue,red,green,purple">
Here's what I tried: 这是我尝试过的:
dict(kvpair.split('=') for kvpair in variableIwantToParse.split('&'))
I get the error "ValueError: dictionary update sequence element #0 has length 5; 2 is required" 我收到错误“ ValueError:字典更新序列元素#0的长度为5; 2为必需”
I tried also to use variableIwantToParse.strip('&') but when I tried to print variableIwantToParse it only displaced one letter at a time. 我也尝试使用variableIwantToParse.strip('&'),但是当我尝试打印variableIwantToParse时,一次只替换了一个字母。
I'm sure this is easy but can't seem to figure out how to parse it. 我敢肯定这很简单,但似乎无法弄清楚如何解析它。 I basically want 10.21,12.12,43.12,12.10 to be associated with blue,red,green,purple (in the order displayed)
我基本上希望将10.21,12.12,43.12,12.10与蓝色,红色,绿色,紫色关联(按显示的顺序)
Thanks very much for your help(and sorry if this is too easy..I just can't for the life of me figure out the command to parse this) :-) 非常感谢您的帮助(对不起,如果这样做太简单了,我一辈子都无法找出解析该命令的方法):-)
Use the built-in urlparse module , do not do these splits yourself. 使用内置的urlparse模块 ,不要自己进行这些拆分。
>>> import urlparse
>>> url_to_parse = "http://example.com/servlet/charting?base_color=grey&chart_width=288&chart_height=160&chart_type=png&chart_style=manufund_pie&3DSet=true&chart_size=small&leg_on=left&static_xvalues=10.21,12.12,43.12,12.10,&static_labels=blue,red,green,purple"
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = urlparse.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}
If you're using Python with a version less than 2.6, then you have to import the cgi module . 如果您使用的Python版本低于2.6,则必须导入cgi模块 。 Do this instead:
改为这样做:
>>> import urlparse
>>> import cgi
>>> parsed_url = urlparse.urlparse(url_to_parse)
>>> query_as_dict = cgi.parse_qs(parsed_url.query)
>>> print query_as_dict
{'chart_size': ['small'], 'base_color': ['grey'], 'chart_style': ['manufund_pie'], 'chart_height': ['160'], 'static_xvalues': ['10.21,12.12,43.12,12.10,'], 'chart_width': ['288'], 'static_labels': ['blue,red,green,purple'], 'leg_on': ['left'], 'chart_type': ['png'], '3DSet': ['true']}
Then to associate them to a dictionary, use the provided dict constructor alongside zip . 然后将它们与字典相关联,请在zip旁边使用提供的dict构造函数。
>>> print dict(zip( query_as_dict['static_labels'][0].split(','), query_as_dict['static_xvalues'][0].split(',')))
{'blue': '10.21', 'purple': '12.10', 'green': '43.12', 'red': '12.12'}
square brackets: 方括号:
dict([kvpair.split('=') for kvpair in variableIwantToParse.split('&')])
also, replacing & with & 同样,用&amp;代替& could help.
有帮助。
This will get you what you want: 这将为您提供所需的东西:
d = dict(kv.split('=') for kv in string_to_parse.split('?')[1][:-2].split('&'))
labels_and_values = zip(d['static_labels'].split(','), d['static_xvalues'].split(','))
It can be really useful to break down things in the command prompt when you run into trouble. 遇到麻烦时,在命令提示符下分解内容可能非常有用。 For example:
例如:
10 > for kv in s.split('&'):
...: print kv.split('=')
If you check it out you'll see splitting on & was causing you issues (feeding dict too many values for one item in the list). 如果您将其签出,则会看到&分裂并导致问题(为列表中的一项提供dict值过多)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.