简体   繁体   English

Web 使用 python 抓取 web 页面的控制台

[英]Web scraping the console of a web page using python

so I have this piece of code which web scrapes a certain site and prints out what it finds on that certain webpage.所以我有这段代码 web 抓取某个站点并打印出它在该特定网页上找到的内容。 Im pretty new to this, so my question is how can I just collect the data from just the console, like what is seen in the picture.我对此很陌生,所以我的问题是如何仅从控制台收集数据,就像图片中看到的那样。

inspect console检查控制台

Here is the code so far, thanks for the help这是到目前为止的代码,感谢您的帮助

import requests导入请求

url = 'url goes here' url = '网址在这里'

r = requests.get(url) r = requests.get(url)

print(r.text)打印(r.text)

Here are some ways to collect the output:下面是一些收集output的方法:

  1. if the data is pretty small and well-formatted, like just 1 line for each URL, you can just copy the output from the console prints.如果数据非常小且格式正确,例如每个 URL 仅 1 行,您可以从控制台打印中复制 output。

  2. if the data is very big, I assume this is your situation, you can write the output into files.如果数据很大,我假设这是您的情况,您可以将 output 写入文件。

    import requests
    
    url = 'url goes here'

    r = requests.get(url)
    
    print(r.text)  
    
    with open('/path/to/file.txt', 'w', encoding='utf-8') as f:
    
        f.write('r.text')
  1. if you have thousands of URL, and need to write into thousand files, just add a for loop for each url and write the output to different files.如果您有数千个 URL,并且需要写入数千个文件,只需为每个 url 添加一个 for 循环并将 output 写入不同的文件。

above example are using a txt file, you can also write the output into a.xml file or.html file, any format that is more convenient to re-use for you, like docx, excel, csv, json, etc. above example are using a txt file, you can also write the output into a.xml file or.html file, any format that is more convenient to re-use for you, like docx, excel, csv, json, etc.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM