繁体   English   中英

如何将 web 抓取 output 格式化为 CSV 中的表?

[英]How to format web scraping output as table in a CSV?

我正在从理论上它是一个表格的网站中提取一些数据。

import requests
from bs4 import BeautifulSoup

cookies = {
    'SISWEB-PUBLIC': 'ORA_WWV-RMvAbLGLSxXJOqOTipG30k1M',
    '_ga': 'GA1.3.825042167.1579292801',
    '_pk_id.11.6e3e': '31091343e8e5c6a9.1579292805.14.1605535420.1584973016.',
    '_pk_ref.11.6e3e': '%5B%22%22%2C%22%22%2C1605535420%2C%22https%3A%2F%2Fwww.google.com%2F%22%5D',
    '_gid': 'GA1.3.532866579.1610911359',
    '_gat_gtag_UA_139253076_4': '1',
}

headers = {
    'Connection': 'keep-alive',
    'Accept': 'text/html, */*; q=0.01',
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Origin': 'https://sisweb.tesouro.gov.br',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
    'Referer': 'https://sisweb.tesouro.gov.br/apex/f?p=2691:2&minimal=full&font=opensans',
    'Accept-Language': 'en-US,en;q=0.9,pt-BR;q=0.8,pt;q=0.7',
}

data = {
  'p_json': '{"salt":"284140192213841769741724635899547408701","pageItems":{"itemsToSubmit":[{"n":"P2_TIPO_LEILAO","v":"1"},{"n":"P2_TIPO_TITULO","v":"1"},{"n":"P2_PESQUISAR","v":"S","ck":"PCb5bs5LDIDvee0z7u0Uj6YkpPyJBARj2dYQ4WkxnaxN599CNVbrf6gulSAHSU5lQmuIPDpNOaTQUQaUXgpU5Q"},{"n":"P2_DATA_INICIAL","v":"14/01/2021"},{"n":"P2_DATA_FINAL","v":"18/01/2021"}],"protected":"U3PMYyQfm1IU1I_Cn_7v3g","rowVersion":""}}',
  'p_flow_id': '2691',
  'p_flow_step_id': '2',
  'p_instance': '16388465980453',
  'p_page_submission_id': '284140192213841769741724635899547408701',
  'p_request': 'PESQUISAR',
  'p_reload_on_submit': 'A'
}

response = requests.post('https://sisweb.tesouro.gov.br/apex/wwv_flow.accept', headers=headers, cookies=cookies, data=data)

我想知道如何在格式为表格的 csv 文件中获取 output (响应),或者可以让我将此 output 视为表格的东西。 谢谢!

Having json format as shown in your example it's easy to convert it to python dictionary using json.dump() method from json package.

with open(filename) as json_file:
       data = json.load(json_file)

然后您可以作为标准 python 字典访问数据。

或者通过抓取将其转换为字典,您可以直接将其写入 csv。

然后您可以使用csv模块将您感兴趣的数据写入 csv 文件。 In case you have your data in a dictionary it's recomended to use csv.DictWriter() API to extend csv table columns with respect to the header.

使用示例:

进口 csv

with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})

希望这是您正在寻找的提示

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM