[英]How can I get more than one digit using parenthesis in regular expressions
I was trying to extract values from a html code using urllib and regular expressions in python3 and when I tried to run this code, it only gave me one of the digits of the number instead of both values even though I added a "+" sign meaning one or more times.我试图在 python3 中使用 urllib 和正则表达式从 html 代码中提取值,当我尝试运行此代码时,它只给了我一个数字而不是两个值,即使我添加了一个“+”号表示一次或多次。 What's wrong here?
这里有什么问题?
import re
import urllib.error,urllib.parse,urllib.request
from bs4 import BeautifulSoup
finalnums=[]
sumn=0
urlfile = urllib.request.urlopen("http://py4e-data.dr-chuck.net/comments_42.html")
html=urlfile.read()
soup = BeautifulSoup( html,"html.parser" )
spantags = soup("span")
for span in spantags:
span=span.decode()
numlist=re.findall(".+([0-9].*)<",span)
print(numlist)
finalnums.extend(numlist)
for anum in finalnums:
sumn=sumn+int(anum)
print("Sum = ",sumn)
This is an example of the string I'm trying to extract the number from:这是我试图从中提取数字的字符串示例:
<span class="comments">54</span>
Use numlist=re.findall("\d+",span)
to search for all contiguous groups of digit characters.使用
numlist=re.findall("\d+",span)
搜索所有连续的数字字符组。
\d
is a character class that's equivalent to [0-9]
, so it would also work if you did numlist=re.findall("[0-9]+",span)
\d
是一个字符 class 相当于[0-9]
,所以如果你做了numlist=re.findall("[0-9]+",span)
Since there is only one number in each <span>
tag:由于每个
<span>
标签中只有一个数字:
sumn = 0
for span in spantags:
sumn += int(re.search(r'\d+', span.decode()).group(0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.