I'm trying to get a hold of the data under the second column having the code "CATAC2021", where "aaaa" are the four letter that follow (eg. aaaa, aaab, etc) on the Shakemap Site using Python. These are the ID of the event.
I have tried to use the following code below to access the second column of the table and retrieve the ID data from the row but I seem to be having no success so far. Does anyone know where I have gone wrong/how to correct this?
from bs4 import BeautifulSoup
from urllib import request
page = request.urlopen('http://shakemapcam.ethz.ch/archive/').read()
soup = BeautifulSoup(page)
desired_table = soup.findAll('table')[2]
# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
if 'CATAC2021' in th.string:
desired_columns.append([headers.index(th), th.getText()])
# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')
for row in rows[1:]:
cells = row.findAll('td')
row_name = row.findNext('th').getText()
for column in desired_columns:
print(cells[column[0]].text, row_name, column[1])
I'd use pandas here to grab the table, then use regex to pull out the pattern (following the four digit and before the first /
. Note though that ther eis an Event ID
column, so just be sure you know the difference. I named it eventId
.
import pandas as pd
url = 'http://shakemapcam.ethz.ch/archive/'
df = pd.read_html(url, header =0)[-1]
df['eventID'] = df['Name/Epicenter'].str.extract(r'(.*)\d{4}(.*)(\s//?.*)(//?.*)')[1]
df['prefix'] = df['Name/Epicenter'].str.extract(r'(.*)\d{4}(.*)(\s//?.*)(//?.*)')[0]
Output:
print(df[['Name/Epicenter','prefix','eventId']])
Name/Epicenter prefix eventId
0 CATAC2021efod / 6.354496002 / -76.18144226 CATAC efod
1 CATAC2021edxe / 15.67289066 / -93.40866852 CATAC edxe
2 CATAC2021ebzg / 9.406171799 / -84.55581665 CATAC ebzg
3 CATAC2021eayx / 14.03658199 / -92.30122375 CATAC eayx
4 CATAC2021eayx / 14.03546429 / -92.30183411 CATAC eayx
... ... ...
1574 ineterloc2018acor / 12.21397209 / -86.7282486 ineterloc acor
1575 ineterloc2018acor / 12.21113586 / -86.73029327 ineterloc acor
1576 ineterloc2018acor / 12.20839691 / -86.73122406 ineterloc acor
1577 ineterloc2018aatd / 16.59416389 / -86.35289764 ineterloc aatd
1578 ineterloc2018aatd / 16.64553833 / -86.26078796 ineterloc aatd
[1579 rows x 3 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.