How to extract webpage source code using Scrapy's xpath?

Question

I have written the following code

from scrapy import Selector
html = '''
<html><head></head><body><table>

<tr> <td>a1</td> <td>b1</td> </tr>
<tr> <td>a2</td> <td>b2</td> </tr>

</table></body></html>
'''

selector = Selector(text=html)
temp = selector.xpath("//td").extract()
print(temp)

and hope to get the following result

[
'<td>a1</td>',
'<td>b1</td>',
'<td>a2</td>',
'<td>b2</td>'
]

But I actually got this

[
'<td>a1</td> <td>b1</td> </tr>\n<tr> <td>a2</td> <td>b2</td> </tr>\n</table>\n</body>\n</html>\n', 
'<td>b1</td> </tr>\n<tr> <td>a2</td> <td>b2</td> </tr>\n</table>\n</body>\n</html>\n', 
'<td>a2</td> <td>b2</td> </tr>\n</table>\n</body>\n</html>\n', 
'<td>b2</td> </tr>\n</table>\n</body>\n</html>\n'
]

but with '/text()' in xpath

temp = selector.xpath("//td/text()").extract()

It turned out to be alright

['a1', 'b1', 'a2', 'b2']

It might just be a simple question, I just didn't find the key.

I tried 'extract', 'extract_frist', 'get', 'getall' all have the same problem.

I don't know what's wrong, please help me

Answer 1

在我卸载我的 Anaconda，然后安装一个纯 python 后，我解决了这个问题......这很奇怪。

How to extract webpage source code using Scrapy's xpath?

Question

1 answers

solution1
0 ACCPTED 2022-06-03 12:22:23

How to extract webpage source code using Scrapy's xpath?

Question

1 answers

solution1 0 ACCPTED 2022-06-03 12:22:23

solution1
0 ACCPTED 2022-06-03 12:22:23