[英]How to write all scraped data to csv?
我有一個 PYTHON 代碼,用於從網站抓取數據並將其寫入 CSV 文件。 但是在運行我的代碼之后,只有最后一行(joblink)顯示在我的 excel 中,而其他行是空的,只有標題。
請問我該如何解決? 下面是我的代碼塊。
for x in range(1, 210):
html_text = requests.get(f'https://www.timesjobs.com/candidate/job-search.html?from=submit&actualTxtKeywords=Python&searchBy=0&rdoOperator=OR&searchType=personalizedSearch&luceneResultSize=25&postWeek=60&txtKeywords=Python&pDate=I&sequence={x}&startPage=1').text
soup = BeautifulSoup(html_text, 'lxml')
jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
with open('jobberman.csv', 'w+', newline = '', encoding = 'utf-8') as f:
header = ['Company Name', 'Keyskill', 'Joblink']
writer = csv.writer(f, delimiter = '')
writer.writerow(header)
for job in jobs:
company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ','')
keyskill = job.find('span', class_ = 'srp-skills').text.replace(' ','')
joblink = job.header.h2.a['href']
print(f"Company Name: {company_name.strip()}")
print(f"Required Skills: {keyskill.strip()}")
print(f"Joblink: {joblink}")
print('')
joblist = [company_name, keyskill, joblink]
writer.writerow(joblist)
主要問題是,您在每次迭代中都會覆蓋您的內容,因此在文件打開時執行外部for-loop
。
...
with open('jobberman.csv', 'w+', newline = '', encoding = 'utf-8') as f:
header = ['Company Name', 'Keyskill', 'Joblink']
writer = csv.writer(f)
writer.writerow(header)
for x in range(1, 120):
html_text = requests.get(f'https://www.timesjobs.com/candidate/job-search.html?from=submit&actualTxtKeywords=Python&searchBy=0&rdoOperator=OR&searchType=personalizedSearch&luceneResultSize=25&postWeek=60&txtKeywords=Python&pDate=I&sequence={x}&startPage=1').text
soup = BeautifulSoup(html_text, 'lxml')
jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
for job in jobs:
company_name = job.find('h3', class_ = 'joblist-comp-name').get_text(strip=True)
keyskill = job.find('span', class_ = 'srp-skills').get_text(strip=True)
joblink = job.header.h2.a['href']
joblist = [company_name, keyskill, joblink]
writer.writerow(joblist)
import csv
from csv import writer
from bs4 import BeautifulSoup
with open('jobberman.csv', 'w+', newline = '', encoding = 'utf-8') as f:
header = ['Company Name', 'Keyskill', 'Joblink']
writer = csv.writer(f)
writer.writerow(header)
for x in range(1, 120):
#### requesting and scraping info
joblist = ['Company Name'+str(x), 'Keyskill'+str(x), 'Joblink'+str(x)]
writer.writerow(joblist)
Company Name,Keyskill,Joblink
Company Name1,Keyskill1,Joblink1
Company Name2,Keyskill2,Joblink2
Company Name3,Keyskill3,Joblink3
Company Name4,Keyskill4,Joblink4
Company Name5,Keyskill5,Joblink5
Company Name6,Keyskill6,Joblink6
Company Name7,Keyskill7,Joblink7
同樣在這里。 無法訪問網站。 但試一試:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.DataFrame([], columns = ['Company Name', 'Keyskill', 'Joblink'])
df.to_csv('jobberman.csv', index=False)
for x in range(1, 210):
html_text = requests.get(f'https://www.timesjobs.com/candidate/job-search.html?from=submit&actualTxtKeywords=Python&searchBy=0&rdoOperator=OR&searchType=personalizedSearch&luceneResultSize=25&postWeek=60&txtKeywords=Python&pDate=I&sequence={x}&startPage=1').text
soup = BeautifulSoup(html_text, 'lxml')
jobs = soup.find_all('li', class_ = 'clearfix job-bx wht-shd-bx')
rows = []
for job in jobs:
company_name = job.find('h3', class_ = 'joblist-comp-name').text.replace(' ','')
keyskill = job.find('span', class_ = 'srp-skills').text.replace(' ','')
joblink = job.header.h2.a['href']
row = {
'Company Name':company_name.strip(),
'Keyskill': keyskill.strip(),
'Joblink': joblink}
rows.append(row)
print(f"Company Name: {company_name.strip()}")
print(f"Required Skills: {keyskill.strip()}")
print(f"Joblink: {joblink}")
print('')
df = pd.DataFrame(rows)
df.to_csv('jobberman.csv', mode='a', header=False, index=False)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.