如何使用 BeautifulSoup4 從 Python 中的網站獲得經常更新的 php 文本？

Question

我想創建一個自動腳本來從經常更新的網頁下載 a.php 文本文件。 我的程序使用請求來獲取網頁。

編碼：

import os, pathlib, subprocess,requests, time, sys



url = 'http://metar.vatsim.net/metar.php?id=all'

current_dir = pathlib.Path(__file__).parent
os.chdir(current_dir)




icao = sys.argv[1]
fp = requests.get(url)
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

dict = {}

fls = str.splitlines(mystr)
for x in range(len(fls)):
    cur = str.split(fls[x])
    dict[cur[0]] = " ".join(cur)
    
try:
    print(dict[icao])
except:
    print('INCORRECT FORMAT OR AIRPORT ID\n')

當我嘗試讀取 fp 時，它顯示錯誤：

mybytes = fp.read()
AttributeError: 'Response' object has no attribute 'read'

有沒有更好的方法來解決這個問題，我有點卡住了。

Answer 1

您正在尋找的是urllib.request ，而不是requests 。

也許這會起作用：

import urllib.request

fp = urllib.request.urlopen(url)
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

這將讀取http://metar.vatsim.net/metar.php?id=all中的文本。

Answer 2

您絕對可以使用請求。 然后，您要提取.text 。

另外，不要以你正在做的方式覆蓋內置的dict 。

import requests

url = 'http://metar.vatsim.net/metar.php?id=all'
fp = requests.get(url)
mystr = fp.text
a_dict = {}

fls = str.splitlines(mystr)

for x in range(len(fls)):
    cur = str.split(fls[x])
    a_dict[cur[0]] = " ".join(cur)
    
try:
    print(a_dict)
except:
    print('INCORRECT FORMAT OR AIRPORT ID\n')

如何使用 BeautifulSoup4 從 Python 中的網站獲得經常更新的 php 文本？

問題描述

2 個解決方案

解決方案1
1 已采納 2021-04-30 01:09:43

解決方案2
1 2021-04-30 02:11:43

如何使用 BeautifulSoup4 從 Python 中的網站獲得經常更新的 php 文本？

問題描述

2 個解決方案

解決方案1 1 已采納 2021-04-30 01:09:43

解決方案2 1 2021-04-30 02:11:43

解決方案1
1 已采納 2021-04-30 01:09:43

解決方案2
1 2021-04-30 02:11:43