简体   繁体   English

如何仅从 python 中的特定单元格中抓取数据?

[英]How to webscrape data from only specific cells in python?

I am trying to webscrape some data from https://il.water.usgs.gov/gmaps/precip/ .我正在尝试从https://il.water.usgs.gov/gmaps/precip/中抓取一些数据。 I only want specific cells from the row called "RAIN GAGE AT PING TOM PARK AT CHICAGO, IL. Only the cells containing the 1, 3, and 12 hour predictions for rain. What should I fix?我只想要名为“RAIN GAGE AT PING TOM PARK AT CHICAGO, IL”的行中的特定单元格。只有包含 1、3 和 12 小时降雨预测的单元格。我应该修复什么?

    import pandas as pd

    url = "https://il.water.usgs.gov/gmaps/precip/"
    df = pd.read_html(url, flavor="bs4")[0]
    print(df.loc[df[0] == "RAIN GAGE AT PING TOM PARK AT CHICAGO, IL"])

Data is dynamically retrieved from another endpoint returning JSON. You could write a function calling that endpoint and pass in location and desired hours从返回 JSON 的另一个端点动态检索数据。您可以编写一个 function 调用该端点并传入位置和所需时间

def get_precipitation(location:str, hrs:list):
    import requests
    url = "https://il.water.usgs.gov/gmaps/precip/data/rainfall_outIL_WSr2.json"
    r = requests.get('https://il.water.usgs.gov/gmaps/precip/data/rainfall_outIL_WSr2.json').json()
    data = [i for i in r['value']['items'] if i['title'] == location][0]
    
    for k,v in data.items():
        if k in hrs:
            print(f'{k}={v}')


if __name__ == "__main__":
    
    location = "RAIN GAGE AT PING TOM PARK AT CHICAGO, IL"   
    hrs = ['precip1hrvalue', 'precip3hrvalue', 'precip12hrvalue']

    get_precipitation(location, hrs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM