I need to get lots of data specific to rivercruising, so I am working with alteryx, and for scraping I want to use python from the command line. I need to write the output file to json or to csv. The output file is empty. The hashtags in the code are for processing the output file in alteryx, as the scraped text already contains ",". Preferably I would love to map the output to Json. My code is as follows:
from mechanize import Browser
from bs4 import BeautifulSoup
import lxml
mech = Browser()
url = 'http://www.cruiseshipschedule.com/viking-river-cruises/viking-aegir-schedule/'
page = mech.open(url)
html = page.read()
html.replace('charset="ISO-8859-1"','charset=utf-8')
s = BeautifulSoup(html, "lxml")
content = s.findAll('div', id="content")
link = s.findAll("a")
h1 = s.findAll("h1")
table = s.findAll("table", border="1")
for link in s.findAll("a"):
linktext = link.text
linkhref = link.get("href")
for h1 in s.findAll("h1"):
ship = h1.text
h2_1 = s.h2
h2_1.text
h2_2 = h2_1.find_next('h2')
itinerary_1 = h2_2.text
h2_3 = h2_2.find_next('h2')
itinerary_2 = h2_3.text
h2_4 = h2_3.find_next('h2')
itinerary_3 = h2_4.text
for table in content:
table0 = s.findAll("table", border='0')
for tr in s.findAll("table", border='1'):
trs1 = s.findAll("tr")
table1 = tr.text.replace('\n','|')
tds1 = s.findAll('td')
uls1 = s.findAll('ul')
lis1 = s.findAll('li')
for tr in s.findAll("table", border='0'):
trs2 = s.findAll("tr")
table2 = tr.text.replace('\n','|')
tds2 = s.findAll('td')
uls2 = s.findAll('ul')
lis2 = s.findAll('li')
all_data=ship+"#"+table1+"#"+table2+"#"+itinerary_1+"#"+itinerary_2+"#"+itinerary_3
all_data = open("Z:/txt files/all_data.txt", "w")
print all_data >> "Z:/txt files/all_data.txt"
To get output to your file, try something like instead of the last 2 lines in your code above:
with open('all_data_txt, 'w') as f:
f.write(all_data.encode('utf8'))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.