解析 CSV 文件中的 URL - Python

Question

I have a CSV file of URL's and I'm attempting to write a code to loop through the URLs and append specific variables them I have in a dictionary.我有一个 URL 的 CSV 文件，我正在尝试编写一个代码来循环遍历 URL 并将它们添加到字典中的特定变量。 Unfortunately, whenever I attempt this using beautiful soup the program does not separate the URLs or only does the first URL.不幸的是，每当我尝试使用美丽的汤进行此操作时，程序都不会分隔 URL 或仅分隔第一个 URL。 I understand this is likely a simple issue but I have been unable to resolve this using solutions from similar questions.我知道这可能是一个简单的问题，但我一直无法使用类似问题的解决方案来解决这个问题。 Below I've attached an excerpt of the code.下面我附上了代码的摘录。 Thank you for any guidance.感谢您的任何指导。

csv_data:
'https://www.sec.gov/Archives/edgar/data/78003/000007800313000017,https://www.sec.gov/Archives/edgar/data/78003/000115752312004450,https://www.sec.gov/Archives/edgar/data/78003/000115752312002789,https://www.sec.gov/Archives/edgar/data/78003/000007800313000013,https://www.sec.gov/Archives/edgar/data/78003/000007800313000029,https://www.sec.gov/Archives/edgar/data/78003/000007800312000008,https://www.sec.gov/Archives/edgar/data/78003/000007800314000046'


content = requests.get(csv_data[1]).content
soup = BeautifulSoup(content, 'lxml')

reports = soup.find('myreports')

master_reports = []

for report in reports.find_all('report')[:-1]:

report_dict = {}
report_dict['name_short'] = report.shortname.text
report_dict['category'] = report.menucategory.text
report_dict['url'] = base_url + report.htmlfilename.text

master_reports.append(report_dict)

print(base_url + report.htmlfilename.text)
print(report.shortname.text)
print(report.menucategory.text)

Answer 1

Is this what you are looking for?这是你想要的？ splitting the url list and looping?拆分url列表并循环？ If so, you'll have to collect the output for each loop, which isn't coded here.如果是这样，您必须收集每个循环的输出，此处未编码。

csv_data = 'https://www.sec.gov/Archives/edgar/data/78003/000007800313000017,https://www.sec.gov/Archives/edgar/data/78003/000115752312004450,https://www.sec.gov/Archives/edgar/data/78003/000115752312002789,https://www.sec.gov/Archives/edgar/data/78003/000007800313000013,https://www.sec.gov/Archives/edgar/data/78003/000007800313000029,https://www.sec.gov/Archives/edgar/data/78003/000007800312000008,https://www.sec.gov/Archives/edgar/data/78003/000007800314000046'
csv_url_list = csv_data.split(',')
for url in csv_url_list:
    content = requests.get(url).content
    soup = BeautifulSoup(content, 'lxml')
    reports = soup.find('myreports')

    master_reports = []

    for report in reports.find_all('report')[:-1]:

    report_dict = {}
    report_dict['name_short'] = report.shortname.text
    report_dict['category'] = report.menucategory.text
    report_dict['url'] = base_url + report.htmlfilename.text

    master_reports.append(report_dict)

    print(base_url + report.htmlfilename.text)
    print(report.shortname.text)
    print(report.menucategory.text)

解析 CSV 文件中的 URL - Python

问题描述

1 个解决方案

解决方案1
0 2020-11-21 02:39:38

解析 CSV 文件中的 URL - Python

问题描述

1 个解决方案

解决方案1 0 2020-11-21 02:39:38

解决方案1
0 2020-11-21 02:39:38