简体   繁体   中英

I am trying to download the Yearly data from this website using python but i am not sure how to approach it?

I want to learn how to download the CSV files for the last ten years using python. I think this would be helpful.

https://www.usgovernmentspending.com/compare_state_debt

My attempts involve requests and pandas.

This is a multipart problem and I'm going to outline the steps I think you should use.

  • The first part is going to be simply downloading the webpage. What I would suggest is use something like requests to get the webpage
  • Once you have that you can use beautiful soup to parse the webpage.
  • I took a look at the website and it looks like there are a number of ways you could download the data. I think the best way to get the data is going to be to extract all the text from this particular part in the page.
  • Once you do that you are probably going to need to clean up the data. I suggest using pandas for that.

People on here aren't going to solve the whole problem for you. That said, if you get stuck along the way and have a specific question, StackOverflow can probably help at that point.

Issue resolved I managed to solve it using selenium.

By doing the following:

from selenium import webdriver # allow launching browser
# Opening in incognito
driver_option = webdriver.ChromeOptions()
#driver_option.add_argument(" — incognito")
chromedriver_path = '# Write your path here' # Change this to your own chromedriver path!

# Creating a webdriver.
def create_webdriver():
 return webdriver.Chrome(executable_path=chromedriver_path, options=driver_option)

URL = ""

browser.get(url)
# Clicking the button.
elem1 = browser.find_element_by_link_text("download file")
# Clicking the button.
elem1.click()

I put the previous code in a loop for all the years until 2020 and I got all the files in CSV format

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM