[英]Why can't I crawl the page?
我正在嘗試抓取網站上的表格,然后將其轉換為 CSV 格式。 盡管有我的代碼,但什么也沒有出現。 你能告訴我出了什么問題嗎?
網址: http : //www.multiclick.co.kr/sub/gamepatch/gamerank.html
不用擔心語言。 請隨時在日歷上設置比今天早一兩天的日期,然后單擊放大鏡。 然后你就可以看到一張桌子了。
# Load the required modules
import urllib
from bs4 import BeautifulSoup
import pandas as pd
# Open up the page
url = "http://www.multiclick.co.kr/sub/gamepatch/gamerank.html"
web_page = urllib.request.Request(
url,
data = None,
headers={'User-Agent': ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/35.0.1916.47 Safari/537.36")})
type(web_page)
web_page = urllib.request.urlopen(web_page)
# Parse the page
soup = BeautifulSoup(web_page, "html.parser")
print(soup)
# Get the table
# Get the columns
# Get the rows
# Stack them altogether
# Save it as a csv form
ais@mx0 說的,不是獲取mian頁面,而是獲取ajax調用,例如:
import csv
import requests
link = "http://ws.api.thelog.co.kr/service/info/rank/2018-10-18"
req = requests.get(link)
content = req.json()
with open('ranks.csv', 'w', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
# write column titles
csv_writer.writerow(['gameRank', 'gameName', 'gameTypeName', 'gameShares', 'publisher', 'gameRankUpDown'])
# write values
for row in content["list"]:
csv_writer.writerow(list(row.values()))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.