简体   繁体   English

Python-Web抓取表格并将仅特定的列写入CSV文件

[英]Python--Web scraping a table and writing only specific columns into a CSV file

I'm having a few issues. 我有几个问题。 First off, when I try to write a CSV file from a web scraping, nothing is written. 首先,当我尝试从网络抓取中写入CSV文件时,什么也没写。 The file does save, but it's completely blank. 该文件确实已保存,但完全为空白。 Ultimately, I'm hoping to open it and call on the water temperature column to calculate an average. 最终,我希望将其打开并调用“水温”列来计算平均值。

My other issue is that I only want a few of the columns from the table in my CSV file. 我的另一个问题是,我只希望CSV文件中的表格中的一些列。 Can someone verify that what I did is correct? 有人可以验证我做的正确吗? I only want the first 3 columns, and then the 14th column. 我只想要前3列,然后是第14列。

Thank you! 谢谢!

import sys
import urllib2
import csv
import requests 
from bs4 import BeautifulSoup

r_temp1 = requests.get('http://www.ndbc.noaa.gov/data/realtime2/BZBM3.txt')
html_temp1 = r_temp1.text
soup = BeautifulSoup(html_temp1, "html.parser")
table_temp1 = soup.find('table')
rows_temp1 = table.findAll('tr')
rows_temp1 = rows_temp1[1:]

#writing to a csv file
csvfile_temp1 = open("temp1.csv","wb")
output_temp1 = csv.writer(csvfile_temp1, delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)
for row in rows_temp1:
    Year = cells[0].text.strip()
    Month = cells[1].text.strip()
    Day = cells[2].text.strip()
    W_temp = cells[14].text.strip()
    output.writerow([Year,Month,Day,W_temp])
csvfile_temp1.close()

Running your code gives: 运行代码可以得到:

File "hh.py", line 11, in <module>
rows_temp1 = table.findAll('tr')

NameError: name 'table' is not defined

And indeed in line 10 you define table_temp1, and not table. 实际上,在第10行中,您定义的是table_temp1,而不是table。 Don't know if you have other issue, but start by reading the errors you get 不知道您是否还有其他问题,但请先阅读您得到的错误

You're not seeing anything in the file because there are no rows in rows_temp1 . 您没有在文件中看到任何内容,因为rows_temp1中没有行。 That array is empty because there are no table rows in a text file. 该数组为空,因为文本文件中没有表行。 It looks like you are expecting an HTML file with a table, but the file is just a plain text file. 看起来您期望带有表的HTML文件,但是该文件只是纯文本文件。

Here is a version that does what you want: 这是可以满足您需求的版本:

import csv
import requests

r_temp1 = requests.get('http://www.ndbc.noaa.gov/data/realtime2/BZBM3.txt')
rows_temp1 = r_temp1.text.split('\n')

#writing to a csv file
csvfile_temp1 = open("temp1.csv","wb")
output_temp1 = csv.writer(csvfile_temp1, delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)
for row in rows_temp1:
    if not row:  continue
    cells = row.split()
    Year = cells[0].strip()
    Month = cells[1].strip()
    Day = cells[2].strip()
    W_temp = cells[14].strip()
    output_temp1.writerow([Year,Month,Day,W_temp])
csvfile_temp1.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM