简体   繁体   English

美丽的汤嵌套表

[英]Beautiful Soup Nested Tables

I am struggling with how to get to a nested table within this URL: 我在如何访问此URL内的嵌套表方面苦苦挣扎:

view-source: http://taxweb.co.guilford.nc.us/CamaPublicAccess/PropertySummary.aspx?REID=0180721 查看源代码: http : //taxweb.co.guilford.nc.us/CamaPublicAccess/PropertySummary.aspx?REID=0180721

Specifically the data stored for "Owner's Mailing Address" where the new table starts on line 370 特别是为“所有者的邮寄地址”存储的数据,其中新表从第370行开始

owner_fields = soup.find(text="Owner's Mailing Address").find('table'),
owner_address = owner_fields.find('td').get_text(),
owner_city = owner_fields.find('td')[2].get_text(),
owner_state_zip = owner_fields.find('td')[3].get_text(),

Am I way off here? 我要离开这里吗?

soup.findAll(attrs={"id":"ctl00_ContentPlaceHolder1_table3"})[0] locates and returns the table. soup.findAll(attrs={"id":"ctl00_ContentPlaceHolder1_table3"})[0]定位并返回表。

The additional .findAll('b') locates the container and content of the address elements. 附加的.findAll('b')查找地址元素的容器和内容。

The map() statement goes over the .findAll('b') elements and returns a unicode version of their content. map()语句遍历.findAll('b')元素,并返回其内容的unicode版本。

address_contents = map(lambda value: value.contents, soup.findAll(attrs={"id":"ctl00_ContentPlaceHolder1_table3"})[0].findAll('b'))

In [56]: address_contents 
Out[56]: 
 [[u'101 OAKHURST AVE'],
 [u' '],
 [u'HIGH POINT'],
 [u'\n', <span id="ctl00_ContentPlaceHolder1_DetailsView4_Label1"></span>],
 [u'NC'],
 [u'27262']]

I will leave the assignment of the list values up to you. 我将列表值的分配留给您。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM