简体   繁体   中英

Return [] when scraping data with bs4

i am trying to scrape data from a website but so far have been pretty unsuccessful. i tried a couple of ways most promising has been this. i am trying to get the yearBuild from the site. can someone help me out. any leads would be highly appreciated

import bs4 as bs
from selenium import webdriver  
wd = webdriver.Chrome()
url = ("https://www.marinetraffic.com/en/ais/details/ships/mmsi:255805792")
wd.get(url)
html_source = wd.page_source
wd.quit()
soup = bs.BeautifulSoup(html_source)
elems = soup.select('#yearBuild > b')
print(elems)
print(soup.prettify())

here elems is returned as an empty list

You can use their API to get info about the ship.

For example:

import re
import json
import requests


url = 'https://www.marinetraffic.com/en/ais/details/ships/mmsi:255805792'

ship_info_url = 'https://www.marinetraffic.com/en/vesselDetails/vesselInfo/shipid:{ship_id}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

r = requests.get(url, headers=headers)
ship_id = re.search(r'shipid:(\d+)', r.url)[1]
data = requests.get(ship_info_url.format(ship_id=ship_id), headers=headers).json()

print(json.dumps(data, indent=4))
print('Year Built = ', data['yearBuilt'])

Prints:

{
    "name": "LAILA",
    "nameAis": "LAILA",
    "imo": 9377559,
    "eni": null,
    "mmsi": 255805792,
    "callsign": "CQDP",
    "country": "Portugal",
    "countryCode": "PT",
    "type": "Cargo - Hazard A (Major)",
    "typeSpecific": "Container Ship",
    "typeColor": "7",
    "grossTonnage": 28048,
    "deadweight": 38080,
    "teu": 2700,
    "liquidGas": null,
    "length": 215.5,
    "breadth": 29.87,
    "yearBuilt": 2008,
    "status": "Active",
    "isNavigationalAid": false,
    "correspondingRoamingStationId": null,
    "homePort": null
}
Year Built =  2008

Could I suggest using VesselFinder instead of MarineTraffic? The data is the same but MarineTraffic is hard to scrape as it's all JavaScript, while VesselFinder can be scraped with just BeautifulSoup.

VesselFinder also uses tables to show the data so it's easy to parse with pandas.

Here's the code:

import pandas as pd
import requests

r = requests.get('https://www.vesselfinder.com/vessels/LAILA-IMO-9377559-MMSI-255805792', headers={'User-Agent': 'iPhone'})

df = pd.read_html(r.text)
ship = ship = pd.concat([df[2], df[3]], ignore_index=True).set_index(0).to_dict()[1]

print(ship['Year of Built'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM