简体   繁体   English

解析后Python不创建csv文件 - 使用selenium和beautifulsoup

[英]Python not creating csv file after parsing - using selenium and beautifulsoup

My code works in two parts 我的代码分为两部分

  1. Opens browser with selenium and adds in the details to get results from a page 使用selenium打开浏览器并添加详细信息以从页面获取结果

  2. Parse the html of the results page and write that to a csv file. 解析结果页面的html并将其写入csv文件。

Problem The second part works only if I download the page and manually add the local url (on my computer). 问题第二部分只有在我下载页面并手动添加本地URL(在我的计算机上)时才有效。 If I add the first part of the code, selenium opens the browser but no csv file is exported. 如果我添加代码的第一部分,selenium会打开浏览器但不会导出csv文件。

Things I've used to write this - Ubuntu Mate 18.04 Pycharm editor Firefox Browser 我曾经写过的东西 - Ubuntu Mate 18.04 Pycharm编辑器Firefox浏览器

I have printed every level of the code and got the right output. 我打印了每个级别的代码并得到了正确的输出。 However, output stops after the for loop. 但是,输出在for循环后停止。

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import pandas as pd
import csv
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

os.environ["PATH"] += os.pathsep + r'/home/pierre/PycharmProjects/scraping/venv'
browser = webdriver.Firefox()
browser.get('http://karresults.nic.in/indexPUC_2019.asp')

reg = browser.find_element_by_id('reg')
reg.send_keys('738286')

sub = browser.find_element_by_class_name('btn-default')

sub.click()

url = browser.current_url

my_url = url

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")

results = []

for record in page_soup.findAll('tr'):
    for data in record.findAll('td'):
        results = results + [data.text.replace(u'\xa0', u'').strip()]

        print(results)

        with open('myfile.csv', 'w') as f:
            for item in results:
                f.write(item + ',')

No errors on Pycharm console Pycharm控制台上没有错误

There is no need to re-request for updated URL, you have to wait for a couple of seconds using a time module. 无需重新请求更新的URL,您必须使用时间模块等待几秒钟。

from bs4 import BeautifulSoup
from selenium import webdriver
import time

os.environ["PATH"] += os.pathsep + r'/home/pierre/PycharmProjects/scraping/venv'
browser = webdriver.Firefox()
browser.get('http://karresults.nic.in/indexPUC_2019.asp')

reg = browser.find_element_by_id('reg')
reg.send_keys('738286')

sub = browser.find_element_by_class_name('btn-default')

sub.click()

time.sleep(3)

soup = BeautifulSoup(browser.page_source, 'lxml')
results = []
for record in soup.find_all('tr'):
    for data in record.find_all('td'):
        results = results + [data.text.replace(u'\xa0', u'').strip()]
        with open('myfile.csv', 'w') as f:
            for item in results:
                f.write(item + ',')

csv file O/P: csv文件O / P:

Name,ANIKET ANIL BALEKUNDRI,Reg. No.,738286,ENGLISH,76,,76P,HINDI,76,,76P,Part A - TOTAL,152,PHYSICS,44,30,74P,CHEMISTRY,46,30,76P,MATHEMATICS,73,,73P,BIOLOGY,55,29,84P,Part B - TOTAL,307,GRAND TOTAL MARKS,459,FINAL RESULT,First Class,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM