I need to remove the [u' prefix and '] suffix that surrounds the data that's important to me. This will get put into a database and from what I see it takes those additional characters. How can I remove them? I've tried .replace on the variable but it returns an error.
import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time
db = MySQLdb.connect(
host=" ",
user=" ",
passwd=" ",
db=" ")
inc = 0
# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]
term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = search.findAll(text = True)
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = search2.findAll(text = True)
print term
print cur
print diff
c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
No thanks to you @jonrsharpe, I found the answer. In the original code the .findAll was retrieving a result set. All I had to do was change it to a str which allowed the strip function to be passed to it. The revised code is below. :
import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time
db = MySQLdb.connect(
host=" ",
user=" ",
passwd=" ",
db=" ")
inc = 0
# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]
term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = str(search.findAll(text = True))
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = str(search2.findAll(text = True))
cur = cur.strip("'[]u")
diff = diff.strip("'[]u")
print term
print cur
print diff
c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
result = str(result)
...
cur = str(search.findAll(text = True))
Stop doing this! There are datatypes other than strings!
result
is a list of lists; search.findAll
gives you a list of text nodes. You can get to, for example, the symbol
value of the first row by saying result[0][0]
; you can get the text of an element by saying just search.getText()
.
Serialising structured objects like lists into a flat string and then trying to pick the bits out of it is not a sensible approach.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.