[英]I need to remove excess characters from string output of BeautifulSoup
I need to remove the [u' prefix and '] suffix that surrounds the data that's important to me. 我需要删除对我重要的数据周围的[u'前缀和']后缀。 This will get put into a database and from what I see it takes those additional characters. 这将被放入数据库中,据我所知,它将使用这些其他字符。 How can I remove them? 如何删除它们? I've tried .replace on the variable but it returns an error. 我试过了.replace变量,但返回错误。
import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time
db = MySQLdb.connect(
host=" ",
user=" ",
passwd=" ",
db=" ")
inc = 0
# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]
term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = search.findAll(text = True)
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = search2.findAll(text = True)
print term
print cur
print diff
c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
No thanks to you @jonrsharpe, I found the answer. 不用了,谢谢@jonrsharpe,我找到了答案。 In the original code the .findAll was retrieving a result set. 在原始代码中,.findAll正在检索结果集。 All I had to do was change it to a str which allowed the strip function to be passed to it. 我所要做的就是将其更改为str,从而允许将strip函数传递给它。 The revised code is below. 修改后的代码如下。 : :
import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time
db = MySQLdb.connect(
host=" ",
user=" ",
passwd=" ",
db=" ")
inc = 0
# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]
term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = str(search.findAll(text = True))
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = str(search2.findAll(text = True))
cur = cur.strip("'[]u")
diff = diff.strip("'[]u")
print term
print cur
print diff
c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
result = str(result)
...
cur = str(search.findAll(text = True))
Stop doing this! 别这样! There are datatypes other than strings! 除了字符串,还有其他数据类型!
result
is a list of lists; result
是列表列表; search.findAll
gives you a list of text nodes. search.findAll
为您提供文本节点列表。 You can get to, for example, the symbol
value of the first row by saying result[0][0]
; 您可以说出result[0][0]
来获取第一行的symbol
值; you can get the text of an element by saying just search.getText()
. 您可以通过只说search.getText()
来获取元素的文本。
Serialising structured objects like lists into a flat string and then trying to pick the bits out of it is not a sensible approach. 将诸如列表之类的结构化对象序列化为一个扁平字符串,然后尝试从中挑选出位不是一个明智的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.