I'm trying to scrape multiple pages of an url by applying each pagenumber to the url and then storing the urls in a list. When executing the iteration only the content from the first page is scraped and not the rest. Where is the fault?
df = pd.DataFrame()
list_of_links = []
url = 'https://marknadssok.fi.se/publiceringsklient?Page='
for link in range(1,10):
urls = url + str(link)
list_of_links.append(urls)
#Establish connection
for i in list_of_links:
r = requests.get(i)
html = BeautifulSoup(r.content, "html.parser")
#Append each column to it's attribute
table_body=html.find('tbody')
rows = table_body.find_all('tr')
data = []
for row in rows:
cols=row.find_all('td')
cols=[x.text.strip() for x in cols]
data.append(cols)
df = pd.DataFrame(data, columns=['Publiceringsdatum', 'utgivare', 'person', 'befattning',
'Närstående', 'karaktär', 'Instrumentnamn', 'ISIN', 'transaktionsdatum',
'volym', 'volymsenhet', 'pris', 'valuta', 'handelsplats',
'status', 'detaljer' ])
The problem was that the data variable which stored the content from the url was in the for-loop meaning. Solved it by taking it out of the for-loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.