I am using the code below to scrape a website, then storage that data into a sqlite table. My issue is with the regex after for n in str(shark):
, for some reason the place, date, article = groups[1], groups[2], groups[3]
does not store any data and thus it does not get intput into my DB. The thing is when I run the following code in my repl group = re.match(r'(.*?)\\W+—?\\W+On\\W+(.*?\\d{4})\\W*(.*)', str(shark[1]), flags=re.DOTALL)
, I am able to get the parsed out text from my shark list. Any idea why?
import pandas as pd
import re ## added
import bs4
import sqlite3
import requests
import textwrap
'''
Let's pull some fresh shark data!
'''
res = requests.get('http://www.sharkresearchcommittee.com/pacific_coast_shark_news.htm')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
shark = []
for i in range(1, 100): # attempting to grab the most recent added paragraph
elems = soup.select('body > div > div > center > table > tr > td:nth-of-type(2) > p:nth-of-type({})'.format(i))
for i in elems:
#print("—" in str(i))
if '—' in str(i):
text = bs4.BeautifulSoup(str(i), 'html.parser')
shark.append(text)
#print(text)
'''
'''
c = sqlite3.connect('shark.db')
try:
c.execute('''CREATE TABLE
mytable (Location STRING,
Date STRING,
Description STRING)''')
except sqlite3.OperationalError: #i.e. table exists already
pass
for n in str(shark):
groups = re.match(r'(.*?)\W+—?\W+On\W+(.*?\d{4})\W*(.*)', n, flags=re.DOTALL)
if not groups:
continue
place, date, article = groups[1], groups[2], groups[3]
print(place)
c.execute('''INSERT INTO mytable(Location, Date, Description) VALUES(?,?,?)''',
(place, date, article))
c.commit()
'''
Read into python
'''
df = pd.read_sql_query("select * from mytable;",c)
print(df)
Problem is str()
in
for n in str(shark):
It converts list shark
in single string but in have to convert every element n
separatelly
for n in shark:
n = str(n)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.