I'm wanting to extract the FIPS code for each county in Louisiana from this website using beautiful soup and create a Pandas Dataframe: https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697
The columns would be FIPS, Name, and State. I've tried finding by tr, td, and table when I inspect the element, but I don't know how to single out just the main data and then put it into a pandas dataframe. Once I find the specific table, it should be easy to do something like:
if state == 'LA':
# put data into a dataframe
import requests
from bs4 import BeautifulSoup
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
# print(soup)
for county in soup.find_all('table'):
print(county.text)
You can select <table>
with class="data"
and then use pd.read_html
. For example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
df = pd.read_html(str(soup.select_one(".data")))[0]
# filter State == 'LA'
print(df[df.State == "LA"].head())
Prints:
FIPS Name State
1109 22001 Acadia LA
1110 22003 Allen LA
1111 22005 Ascension LA
1112 22007 Assumption LA
1113 22009 Avoyelles LA
There is one table so can iterate over the <tr>
elements in that one table.
If want a data frame to include only one particular state then can filter it before adding to a data frame, or filter the data frame of all data for a subset data frame.
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = "https://www.nrcs.usda.gov/wps/portal/nrcs/detail/la/technical/cp/?cid=nrcs143_013697"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, 'html.parser')
data = []
for tr in soup.find('table', class_='data').find_all('tr'):
row = [td.text for td in tr.find_all('td')]
# If want to filter out all except LA then can do that here
if len(row) == 3 and row[2] == 'LA':
data.append(row)
df = pd.DataFrame(data, columns=['FIPS', 'Name', 'State'])
print(df)
Output:
FIPS Name State
0 22001 Acadia LA
1 22003 Allen LA
2 22005 Ascension LA
3 22007 Assumption LA
4 22009 Avoyelles LA
.. ... ... ...
63 22127 Winn LA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.