简体   繁体   English

我需要从BeautifulSoup的字符串输出中删除多余的字符

[英]I need to remove excess characters from string output of BeautifulSoup

I need to remove the [u' prefix and '] suffix that surrounds the data that's important to me. 我需要删除对我重要的数据周围的[u'前缀和']后缀。 This will get put into a database and from what I see it takes those additional characters. 这将被放入数据库中,据我所知,它将使用这些其他字符。 How can I remove them? 如何删除它们? I've tried .replace on the variable but it returns an error. 我试过了.replace变量,但返回错误。

import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time

db = MySQLdb.connect(
  host=" ",
  user=" ",
  passwd=" ",
  db=" ")

inc = 0

# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]

term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = search.findAll(text = True)
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = search2.findAll(text = True)
print term
print cur
print diff

c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()

No thanks to you @jonrsharpe, I found the answer. 不用了,谢谢@jonrsharpe,我找到了答案。 In the original code the .findAll was retrieving a result set. 在原始代码中,.findAll正在检索结果集。 All I had to do was change it to a str which allowed the strip function to be passed to it. 我所要做的就是将其更改为str,从而允许将strip函数传递给它。 The revised code is below. 修改后的代码如下。 :

import urllib
import mechanize
from bs4 import BeautifulSoup
import requests
import re
import MySQLdb
import time

db = MySQLdb.connect(
  host=" ",
  user=" ",
  passwd=" ",
  db=" ")

inc = 0

# while inc != 3289:
c = db.cursor()
c.execute("""SELECT `symbol` FROM `stocks` LIMIT %s,1""", (inc,))
result = c.fetchall()
result = str(result)

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
br = mechanize.Browser()
br.set_handle_robots(False)
br.addHeaders = [('User-agent',user_agent)]

term = result.replace('((','').replace(',)','').replace("'",'')
url = "http://www.marketwatch.com/investing/stock/"+term
soup = BeautifulSoup(requests.get(url).text)
search = soup.find('p', attrs = {'class':'data bgLast'})
cur = str(search.findAll(text = True))
search2 = soup.find('span', attrs = {'class':'bgChange'})
diff = str(search2.findAll(text = True))
cur = cur.strip("'[]u")
diff = diff.strip("'[]u")
print term
print cur
print diff

c.execute("""UPDATE stocks SET cur = %s WHERE symbol = %s""", (cur,term))
c.execute("""UPDATE stocks SET diff = %s WHERE symbol = %s""", (diff,term))
db.commit()
result = str(result)
...
cur = str(search.findAll(text = True))

Stop doing this! 别这样! There are datatypes other than strings! 除了字符串,还有其他数据类型!

result is a list of lists; result是列表列表; search.findAll gives you a list of text nodes. search.findAll为您提供文本节点列表。 You can get to, for example, the symbol value of the first row by saying result[0][0] ; 您可以说出result[0][0]来获取第一行的symbol值; you can get the text of an element by saying just search.getText() . 您可以通过只说search.getText()来获取元素的文本。

Serialising structured objects like lists into a flat string and then trying to pick the bits out of it is not a sensible approach. 将诸如列表之类的结构化对象序列化为一个扁平字符串,然后尝试从中挑选出位不是一个明智的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从字符串中选择单词时,使用BeautifulSoup Python从字符串中删除不需要的字符 - Remove unwanted characters from string with BeautifulSoup Python when selecting words from string 如何从BeautifulSoup输出中取消特殊字符? - How to unescape special characters from BeautifulSoup output? 我需要比较两个字符串并删除与字符串一匹配的字符python - I need to compare two strings and remove characters if they match from string one, python 从BeautifulSoup对象中删除非BMP字符 - Remove non BMP characters from BeautifulSoup object 如何在Python中编码/解码此BeautifulSoup字符串,以便输出非标准拉丁字符? - How do I encode/decode this BeautifulSoup string in Python so that non-standard Latin characters are output? 从数据框中删除多余的信息 - Remove excess info from dataframe PYTHON - 我需要从行中删除一些字符 - PYTHON - I need to remove some characters from rows 在列表中从BeautifulSoup输出拆分字符串 - Split string from BeautifulSoup output in a list 我需要一个Python函数,当给出所需的字符概率时,它将输出一个由4个不同字符组成的随机字符串 - I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters 如何从 BeautifulSoup 的“get_text()”输出中删除一些文本 - How do I remove some text from “get_text()” output in BeautifulSoup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM