简体   繁体   English

为什么我得到的是标签列表而不是数字?

[英]Why am I getting a list of tags instead of numbers?

import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import requests
from time import sleep
from random import randint
import re

towns = pd.DataFrame()

url = f"https://www.city-data.com/city/Adak-Alaska.html"
page = requests.get(url).text
doc = BeautifulSoup(page, "html.parser")

sex_population = str(doc.find(id="population-by-sex"))
(males, females) = [float(x) for x in re.findall(r"(?<=\()[0-9]+\.[0-9]+(?=\%\))", sex_population)]
print(males, females)

religion_population = str(doc.find(id="religion"))
atheist = float(re.findall("(?<=None<\/td><td>)[0-9,]*(?=<\/td><td>)", 
religion_population)[0].replace(",", ""))
print(atheist)

total_population = str(doc.find(id="city-population"))
residents = float(re.findall("(?<=</b> )[0-9]*", total_population)[0].replace(",", ""))
print(residents)

believers = re.findall("(<?<td>)[0-9,]*", religion_population)
for x in believers:
    x.replace(",", "")
print(believers)

instead of printing out a list of numbers like I'm expecting, print(believers) prints out a list of < td >. print(believers) 不是像我期望的那样打印出一个数字列表,而是打印出一个 < td > 的列表。 What am I doing wrong here?我在这里做错了什么? I checked my work on regex 101.我检查了我在正则表达式 101 上的工作。

I would use BeautifulSoup's functions instead of regex - .find_all() , .get_text() ,我会使用 BeautifulSoup 的函数而不是正则表达式 - .find_all().get_text()

import requests
from bs4 import BeautifulSoup

url = "https://www.city-data.com/city/Adak-Alaska.html"
response = requests.get(url)
#print('status:', response.status_code)
soup = BeautifulSoup(response.text, "html.parser")

religion_population = soup.find(id="religion").find_all('tr')

for row in religion_population:
    columns = row.find_all('td')
    if columns:
        religion = columns[0].get_text(strip=True)
        number   = columns[1].get_text(strip=True).replace(",", "")
        print(f'religion: {religion} | number: {number}')

Result:结果:

religion: Orthodox | number: 754
religion: Evangelical Protestant | number: 232
religion: Catholic | number: 185
religion: Other | number: 112
religion: Mainline Protestant | number: 82
religion: None | number: 4196

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM