I am trying to scrape https://m.the-numbers.com/market/2018/top-grossing-movies , specifically the table into a CSV. I am using Python and Beautiful Soup, but I am very new to this, and would love any tips any solutions. What are some simple ways to tackle this issue?
Thank you
This is my latest experiment below...
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get('https://m.the-numbers.com/market/2018/top-grossing-movies').text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('cms_scrape.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['filmTitle', 'releasDate', 'distributor', 'genre', 'gross', 'ticketsSold'])
for tbody in soup.find_all('a', class_='table-responsive'):
filmTitle = tbody.tr.td.b.a.text
print(filmTitle)
csv_writer.writerow([filmTitle])
csv_file.close()
assuming you already have the value of source
, you could do this:
import pandas as pd
df = pd.read_html(source)[0]
df.to_csv('cms_scrape.csv', index=False)
Something like the code below would do the job.
Useful links on that topic:
import requests
from bs4 import BeautifulSoup
import csv
# Making get request
r = requests.get('https://m.the-numbers.com/market/2018/top-grossing-movies')
# Creating BeautifulSoup object
soup = BeautifulSoup(r.text, 'lxml')
# Localizing table from the BS object
table_soup = soup.find('div', id='page_filling_chart').find('div', class_='table-responsive').find('table')
# Iterating through all trs in the table except the first(header) and the last two(summary) rows
movies = []
for tr in table_soup.find_all('tr')[1:-2]:
tds = tr.find_all('td')
# Creating dict for each row and appending it to the movies list
movies.append({
'rank': tds[0].text.strip(),
'movie': tds[1].text.strip(),
'release_date': tds[2].text.strip(),
'distributor': tds[3].text.strip(),
'genre': tds[4].text.strip(),
'gross': tds[5].text.strip(),
'tickets_sold': tds[6].text.strip(),
})
# Writing movies list of dicts to file using csv.DictWriter
with open('movies.csv', 'w', encoding='utf-8', newline='\n') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=movies[0].keys())
writer.writeheader()
writer.writerows(movies)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.