want to ask for a help with table parsing using scrapy in python2 Here is my table: link to table I need to get values of the <td>
tags. Try to use next python code:
rows = resp.xpath('//*[@id="Vorlage_Infobox_Unternehmen"]')
if not rows:
rows = resp.xpath('.//*[@id="Vorlage_Infobox_Unternehmen"]//table')
try:
if rows:
extract = lambda row, path: row.xpath(path).extract_first().strip()
if '<th>' in str(rows):
infobox = {extract(row, 'string(./th)'): extract(row, 'string(./td)') for row in rows}
elif '<tr>' in str(rows):
infobox = {extract(row, 'string(./td[1])'): extract(row, 'string(./td[2])') for row in rows}
elif '<table>' in str(rows):
infobox = {extract(row, 'string(./th)'): extract(row, 'string(./td)') for row in rows}
else:
infobox = {extract(row, 'string(./table/tbody/tr[1])'): extract(row, 'string(./td[1])') for row in rows}
But I do something wrong and can not get what I wand. Please help me to understand my mistake.
If you want to get the values of <td>
inside <table>
you could do this on your xpath:
table = resp.xpath('//table[@id="Vorlage_Infobox_Unternehmen"]')
if table:
all_table_data = table.xpath('//td')
when you use table.xpath('some_xpath')
it will apply it on the element that was already selected. You could also skip that test and do it directly:
all_table_data = resp.xpath('//table[@id="Vorlage_Infobox_Unternehmen"]//td')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.