从使用Python中的BeautifulSoup创建的列表中获取整数

Question

I'm a beginner in Python and i need some help about this code : 我是Python的初学者，我需要一些有关此代码的帮助：

from urllib.request import *
from bs4 import BeautifulSoup
import re

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup=BeautifulSoup(a,'html.parser')
nombres=[]
tout = (soup.find_all('td'))
str_tout=str(tout)     
tout = [float(s) for s in re.findall(r'\d+\.\d+', str_tout)]
nombres.append(tout)
print(nombres)

From a website, i need to get all the numeric values contained in it (it's juste a part contained in the whole code). 从一个网站，我需要获取其中包含的所有数值（这只是整个代码中包含的一部分）。 I have succeeded in extracting the floats, but i can't get the integers. 我已经成功提取了浮点数，但是我无法获取整数。 I have tried many things but i didn't figure out how to do. 我已经尝试了很多事情，但是我不知道该怎么做。 Thanks for your help. 谢谢你的帮助。

EDIT : For this link ( https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/9GYIGO.html ), the method given just below isn't working because in the list, there are integers, floats but also characters. 编辑：对于此链接（ https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/9GYIGO.html ），下面给出的方法不起作用，因为列表中有整数，浮点数和字符。 And some chain of characters start with a number, which is complicating the thing. 而且某些字符链以数字开头，这使事情变得复杂。 How can i catch the integers but not the characters starting with a number? 如何捕获整数而不捕获以数字开头的字符？

Answer 1

Integers don't have the form \\d+\\.\\d+ , so let's make the decimal point and digits optional with ^\\d+(?:\\.\\d+)?$ (note the non-capturing group. It is important). 整数的格式不是\\d+\\.\\d+ ，因此让小数点和数字与^\\d+(?:\\.\\d+)?$可选（请注意非捕获组。这一点很重要）。

Then, I'd try to match each td.text by itself: 然后，我将尝试td.text匹配每个td.text ：

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup = BeautifulSoup(a,'html.parser')
nombres = []
tds = soup.find_all('td')
for td in tds:
    if re.match(r'^\d+(?:\.\d+)?$', td.text):
        nombres.append(float(td.text))
print(nombres)

This outputs 这个输出

[89.169, 54.893, 19.212, 87.045, 2.248, 99.947, 6190.0, 83.096]

As a last improvement I'd use a list comprehenssion with a compiled regex to improve the performance a bit: 作为最后的改进，我将结合使用列表理解和已编译的正则表达式来稍微提高性能：

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup = BeautifulSoup(a,'html.parser')
tds = soup.find_all('td')
numbers_regex = re.compile(r'^\d+(?:\.\d+)?$')
nombres = [float(td.text) for td in tds if numbers_regex.match(td.text)]

Answer 2

You should keep doing with you own way, and you can complete your job by using split . 您应该继续按自己的方式做事，并且可以使用split来完成工作。

from urllib.request import *
from bs4 import BeautifulSoup
import re

req = Request("https://adrianchifu.com/teachings/AMSE/MAG1/project/Xlrda/dsuR/2/J9ED27Y.html")
a = urlopen(req).read()
soup = BeautifulSoup(a,'html.parser')
nombres = []
tout = [ele.text for ele in soup.find_all('td')]
tout = [text if not re.findall(r"^\d+\.\d+",text) else int(text.split(".")[0]) for text in tout]
print(tout)
# [89, 54, 19, 'OIK3XF02PS', 87, 2, 99, '6190', 83, 'E2RYAFAE']

Answer 3

If you are looking for regex for matching the integers. 如果您正在寻找正则表达式来匹配整数。

^[1-9][0-9]{0,2}$ ^ [1-9] [0-9] {0,2} $

All positive non-zero integers between 1 and 999. You can adjust the upper range of this expression by changing the second number (ie 2) in the {0,2} part of the expression. 所有介于1和999之间的非零正整数。您可以通过更改表达式{0,2}部分中的第二个数字（即2）来调整此表达式的上限。

Courtsy: http://regexlib.com 礼貌： http: //regexlib.com

从使用Python中的BeautifulSoup创建的列表中获取整数

问题描述

3 个解决方案

解决方案1
0 2018-12-02 20:35:42

解决方案2
0 2018-12-03 05:02:33

解决方案3
-2 2018-12-02 20:30:43

从使用Python中的BeautifulSoup创建的列表中获取整数

问题描述

3 个解决方案

解决方案1 0 2018-12-02 20:35:42

解决方案2 0 2018-12-03 05:02:33

解决方案3 -2 2018-12-02 20:30:43

解决方案1
0 2018-12-02 20:35:42

解决方案2
0 2018-12-03 05:02:33

解决方案3
-2 2018-12-02 20:30:43