Extract from table (Scrapy)

Question

want to ask for a help with table parsing using scrapy in python2 Here is my table: link to table I need to get values of the <td> tags. Try to use next python code:

rows = resp.xpath('//*[@id="Vorlage_Infobox_Unternehmen"]')
if not rows:
    rows = resp.xpath('.//*[@id="Vorlage_Infobox_Unternehmen"]//table')
try:
    if rows:
        extract = lambda row, path: row.xpath(path).extract_first().strip()
        if '<th>' in str(rows):
            infobox = {extract(row, 'string(./th)'): extract(row, 'string(./td)') for row in rows}
        elif '<tr>' in str(rows):
            infobox = {extract(row, 'string(./td[1])'): extract(row, 'string(./td[2])') for row in rows}
        elif '<table>' in str(rows):
            infobox = {extract(row, 'string(./th)'): extract(row, 'string(./td)') for row in rows}
        else:
            infobox = {extract(row, 'string(./table/tbody/tr[1])'): extract(row, 'string(./td[1])') for row in rows}

But I do something wrong and can not get what I wand. Please help me to understand my mistake.

Answer 1

If you want to get the values of <td> inside <table> you could do this on your xpath:

    table = resp.xpath('//table[@id="Vorlage_Infobox_Unternehmen"]')
    if table:
        all_table_data = table.xpath('//td')

when you use table.xpath('some_xpath') it will apply it on the element that was already selected. You could also skip that test and do it directly:

    all_table_data = resp.xpath('//table[@id="Vorlage_Infobox_Unternehmen"]//td')

Extract from table (Scrapy)

Question

1 answers

solution1
0 ACCPTED 2018-08-03 12:40:22

Extract from table (Scrapy)

Question

1 answers

solution1 0 ACCPTED 2018-08-03 12:40:22

solution1
0 ACCPTED 2018-08-03 12:40:22