繁体   English   中英

如何阅读 BeautifulSoup Div 标签 object 作为字典

[英]How to read a BeautifulSoup Div Tag object as a dictionary

这里是 HTML 和 BeautifulSoup 的新成员,所以很抱歉。 . . 我用 BS4 阅读了一个房地产网站,并设法在特定的 Div Class 中获得了我想要的信息

list_1_divs = soup.find_all('div', class_="ListingCell-AllInfo ListingUnit")

BS4找到29个Parent Divs,每个父Divs包含很多子Divs,但我想要的信息似乎都在parent中,所以我删除了他们所有的子Divs。 当我打印它时,变量“ s_row ”中的结果父 Div 看起来像一个字符串,但调试模式将“ s_row ”描述为包含 attrs = {dict:13} 的 {Tag: 3}然后列出我想要的元素调试 window 中的结构化列表。

如何打印(或传递给 Pandas){Tag} object 中的基础字典? 我的最终目标是将 13 个字典元素列为一个表,其中 29 行包含每个“ s_row ”的值。 提前致谢。

代码:

import urllib.request
from bs4 import BeautifulSoup
wiki = "https://www.lamudi.com.ph/metro-manila/makati/rockwell-1/buy/"
page = urllib.request.urlopen(wiki)
soup = BeautifulSoup(page, features='html.parser')
list_divs = soup.find_all('div', class_="ListingCell-AllInfo ListingUnit")
for s_row in list_divs:
    for child in s_row.find_all("div"):
        child.decompose()
    print(s_row)

如果我没理解错的话,您想将每个属性提取为 dataframe 中的列:

import pandas as pd
import urllib.request
from bs4 import BeautifulSoup


wiki = "https://www.lamudi.com.ph/metro-manila/makati/rockwell-1/buy/"
page = urllib.request.urlopen(wiki)
soup = BeautifulSoup(page, features='html.parser')
list_divs = soup.find_all('div', class_="ListingCell-AllInfo ListingUnit")
all_data = []
for s_row in list_divs:
    all_data.append({})
    for a in s_row.attrs:
        if a == 'class':
            continue
        all_data[-1][a] = s_row[a]

df = pd.DataFrame(all_data)
print(df)

印刷:

   data-price data-category                data-subcategories data-car_spaces data-bedrooms  ... data-price_range data-sqm_range data-rooms_total data-land_size data-subdivisionname
0    82000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN              NaN            NaN                  NaN
1     9800000   condominium          ["condominium","studio"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
2    48990000   condominium  ["condominium","double-bedroom"]             NaN             2  ...      37.8M-48.9M     93-121 sqm              NaN            NaN                  NaN
3    73730000   condominium       ["condominium","3-bedroom"]             NaN             3  ...      45.3M-73.7M    126-202 sqm              NaN            NaN                  NaN
4    26600000   condominium  ["condominium","single-bedroom"]             NaN             1  ...            26.6M         62 sqm              NaN            NaN                  NaN
5    27500000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
6   130000000   condominium     ["condominium","penthouse-1"]             NaN             4  ...              NaN            NaN              NaN            NaN                  NaN
7    78000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN              NaN            NaN                  NaN
8    55000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            165                3            NaN                  NaN
9    19000000   condominium  ["condominium","single-bedroom"]               1             1  ...              NaN             64                1            NaN                  NaN
10   30000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
11   14000000   condominium  ["condominium","single-bedroom"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
12   50000000   condominium       ["condominium","3-bedroom"]             NaN             3  ...              NaN            NaN              NaN            NaN                  NaN
13   48000000   condominium       ["condominium","3-bedroom"]             NaN             3  ...              NaN            NaN              NaN            NaN                  NaN
14   27000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
15   36000000   condominium       ["condominium","3-bedroom"]             NaN             3  ...              NaN            NaN              NaN            NaN                  NaN
16   52000000         house   ["house","single-family-house"]               4             3  ...              NaN            NaN              NaN            110         Palm Village
17   48000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN                4            NaN                  NaN
18   37500000   condominium  ["condominium","double-bedroom"]               2             2  ...              NaN            NaN              NaN            NaN                  NaN
19   19000000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
20   66700000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN              NaN            NaN                  NaN
21   16500000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
22   12900000   condominium  ["condominium","single-bedroom"]               1             1  ...              NaN            NaN              NaN            NaN                  NaN
23   20000000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
24   17300000   condominium  ["condominium","single-bedroom"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
25   25000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
26   14000000   condominium  ["condominium","single-bedroom"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
27   32000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
28   38000000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN

[29 rows x 17 columns]

BeautifulSoup<a>在标签中不显示标签</a><div></div><div id="text_translate"><p>在我做的一些测试中,我注意到<strong>div</strong>标签内的标签<strong>a</strong> , <strong>beautifulsoup</strong>自动将其“翻译”为文本:</p><pre> &lt;div class='a'&gt; &lt;a href='....'&gt;TEXT&lt;/a&gt; &lt;i..... &lt;/div&gt;</pre><p> 当我使用命令find_all('div', {'class': 'a'})搜索<strong>div</strong>标签并尝试打印结果div.a时,bs4 显示值<strong>None</strong> ...但如果我尝试使用print div.text bs4 仅显示<strong>TEXT</strong>而不是标签<strong>a</strong> )。</p><p> 这是部分代码:</p><pre> soup = BeautifulSoup(html, 'lxml') data=soup.find_all('div', {'class': 'a'}) for div in data: print div.a $ None</pre><p> 为什么?</p><p> <strong>更新</strong>:刚才我注意到......这是另一个问题。 在源代码中有标签<strong>a</strong> ...但是现在,(看到带有美化的 output)我意识到标签 bs4 让我将其视为<strong>div</strong> ,而实际上它是标签<strong>a</strong> ! 奇怪的!!!</p><p> 漏洞???</p><p> <strong>解决</strong>了我做了一些清理并删除了请求和urllib3的所有包......然后我用<strong>apt</strong>重新安装了所有东西,现在可以工作了。 requests 和 urllib3 的包版本分别为: <strong>2.12.4-1</strong>和<strong>1.19.1-1</strong></p></div>

[英]BeautifulSoup not show tag <a> in tag <div>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python BeautifulSoup无法读取div标签 BeautifulSoup如何索引标签对象 如何从 BeautifulSoup Tag 对象中解开标签? 带有div标签且没有属性的BeautifulSoup BeautifulSoup<a>在标签中不显示标签</a><div></div><div id="text_translate"><p>在我做的一些测试中,我注意到<strong>div</strong>标签内的标签<strong>a</strong> , <strong>beautifulsoup</strong>自动将其“翻译”为文本:</p><pre> &lt;div class='a'&gt; &lt;a href='....'&gt;TEXT&lt;/a&gt; &lt;i..... &lt;/div&gt;</pre><p> 当我使用命令find_all('div', {'class': 'a'})搜索<strong>div</strong>标签并尝试打印结果div.a时,bs4 显示值<strong>None</strong> ...但如果我尝试使用print div.text bs4 仅显示<strong>TEXT</strong>而不是标签<strong>a</strong> )。</p><p> 这是部分代码:</p><pre> soup = BeautifulSoup(html, 'lxml') data=soup.find_all('div', {'class': 'a'}) for div in data: print div.a $ None</pre><p> 为什么?</p><p> <strong>更新</strong>:刚才我注意到......这是另一个问题。 在源代码中有标签<strong>a</strong> ...但是现在,(看到带有美化的 output)我意识到标签 bs4 让我将其视为<strong>div</strong> ,而实际上它是标签<strong>a</strong> ! 奇怪的!!!</p><p> 漏洞???</p><p> <strong>解决</strong>了我做了一些清理并删除了请求和urllib3的所有包......然后我用<strong>apt</strong>重新安装了所有东西,现在可以工作了。 requests 和 urllib3 的包版本分别为: <strong>2.12.4-1</strong>和<strong>1.19.1-1</strong></p></div> 如何将外部标签添加到 BeautifulSoup 对象 在BeautifulSoup中访问带有标签类型的字典 如何将BeautifulSoup对象保存到文件,然后以BeautifulSoup的形式从中读取? Beautifulsoup多个div内容到字典 如何使用div标签作为使用BeautifulSoup搜索html文档的起点
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM