簡體   English   中英

如何閱讀 BeautifulSoup Div 標簽 object 作為字典

[英]How to read a BeautifulSoup Div Tag object as a dictionary

這里是 HTML 和 BeautifulSoup 的新成員,所以很抱歉。 . . 我用 BS4 閱讀了一個房地產網站,並設法在特定的 Div Class 中獲得了我想要的信息

list_1_divs = soup.find_all('div', class_="ListingCell-AllInfo ListingUnit")

BS4找到29個Parent Divs,每個父Divs包含很多子Divs,但我想要的信息似乎都在parent中,所以我刪除了他們所有的子Divs。 當我打印它時,變量“ s_row ”中的結果父 Div 看起來像一個字符串,但調試模式將“ s_row ”描述為包含 attrs = {dict:13} 的 {Tag: 3}然后列出我想要的元素調試 window 中的結構化列表。

如何打印(或傳遞給 Pandas){Tag} object 中的基礎字典? 我的最終目標是將 13 個字典元素列為一個表,其中 29 行包含每個“ s_row ”的值。 提前致謝。

代碼:

import urllib.request
from bs4 import BeautifulSoup
wiki = "https://www.lamudi.com.ph/metro-manila/makati/rockwell-1/buy/"
page = urllib.request.urlopen(wiki)
soup = BeautifulSoup(page, features='html.parser')
list_divs = soup.find_all('div', class_="ListingCell-AllInfo ListingUnit")
for s_row in list_divs:
    for child in s_row.find_all("div"):
        child.decompose()
    print(s_row)

如果我沒理解錯的話,您想將每個屬性提取為 dataframe 中的列:

import pandas as pd
import urllib.request
from bs4 import BeautifulSoup


wiki = "https://www.lamudi.com.ph/metro-manila/makati/rockwell-1/buy/"
page = urllib.request.urlopen(wiki)
soup = BeautifulSoup(page, features='html.parser')
list_divs = soup.find_all('div', class_="ListingCell-AllInfo ListingUnit")
all_data = []
for s_row in list_divs:
    all_data.append({})
    for a in s_row.attrs:
        if a == 'class':
            continue
        all_data[-1][a] = s_row[a]

df = pd.DataFrame(all_data)
print(df)

印刷:

   data-price data-category                data-subcategories data-car_spaces data-bedrooms  ... data-price_range data-sqm_range data-rooms_total data-land_size data-subdivisionname
0    82000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN              NaN            NaN                  NaN
1     9800000   condominium          ["condominium","studio"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
2    48990000   condominium  ["condominium","double-bedroom"]             NaN             2  ...      37.8M-48.9M     93-121 sqm              NaN            NaN                  NaN
3    73730000   condominium       ["condominium","3-bedroom"]             NaN             3  ...      45.3M-73.7M    126-202 sqm              NaN            NaN                  NaN
4    26600000   condominium  ["condominium","single-bedroom"]             NaN             1  ...            26.6M         62 sqm              NaN            NaN                  NaN
5    27500000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
6   130000000   condominium     ["condominium","penthouse-1"]             NaN             4  ...              NaN            NaN              NaN            NaN                  NaN
7    78000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN              NaN            NaN                  NaN
8    55000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            165                3            NaN                  NaN
9    19000000   condominium  ["condominium","single-bedroom"]               1             1  ...              NaN             64                1            NaN                  NaN
10   30000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
11   14000000   condominium  ["condominium","single-bedroom"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
12   50000000   condominium       ["condominium","3-bedroom"]             NaN             3  ...              NaN            NaN              NaN            NaN                  NaN
13   48000000   condominium       ["condominium","3-bedroom"]             NaN             3  ...              NaN            NaN              NaN            NaN                  NaN
14   27000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
15   36000000   condominium       ["condominium","3-bedroom"]             NaN             3  ...              NaN            NaN              NaN            NaN                  NaN
16   52000000         house   ["house","single-family-house"]               4             3  ...              NaN            NaN              NaN            110         Palm Village
17   48000000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN                4            NaN                  NaN
18   37500000   condominium  ["condominium","double-bedroom"]               2             2  ...              NaN            NaN              NaN            NaN                  NaN
19   19000000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
20   66700000   condominium       ["condominium","3-bedroom"]               2             3  ...              NaN            NaN              NaN            NaN                  NaN
21   16500000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
22   12900000   condominium  ["condominium","single-bedroom"]               1             1  ...              NaN            NaN              NaN            NaN                  NaN
23   20000000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN
24   17300000   condominium  ["condominium","single-bedroom"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
25   25000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
26   14000000   condominium  ["condominium","single-bedroom"]             NaN             1  ...              NaN            NaN              NaN            NaN                  NaN
27   32000000   condominium  ["condominium","double-bedroom"]             NaN             2  ...              NaN            NaN              NaN            NaN                  NaN
28   38000000   condominium  ["condominium","double-bedroom"]               1             2  ...              NaN            NaN              NaN            NaN                  NaN

[29 rows x 17 columns]

BeautifulSoup<a>在標簽中不顯示標簽</a><div></div><div id="text_translate"><p>在我做的一些測試中,我注意到<strong>div</strong>標簽內的標簽<strong>a</strong> , <strong>beautifulsoup</strong>自動將其“翻譯”為文本:</p><pre> &lt;div class='a'&gt; &lt;a href='....'&gt;TEXT&lt;/a&gt; &lt;i..... &lt;/div&gt;</pre><p> 當我使用命令find_all('div', {'class': 'a'})搜索<strong>div</strong>標簽並嘗試打印結果div.a時,bs4 顯示值<strong>None</strong> ...但如果我嘗試使用print div.text bs4 僅顯示<strong>TEXT</strong>而不是標簽<strong>a</strong> )。</p><p> 這是部分代碼:</p><pre> soup = BeautifulSoup(html, 'lxml') data=soup.find_all('div', {'class': 'a'}) for div in data: print div.a $ None</pre><p> 為什么?</p><p> <strong>更新</strong>:剛才我注意到......這是另一個問題。 在源代碼中有標簽<strong>a</strong> ...但是現在,(看到帶有美化的 output)我意識到標簽 bs4 讓我將其視為<strong>div</strong> ,而實際上它是標簽<strong>a</strong> ! 奇怪的!!!</p><p> 漏洞???</p><p> <strong>解決</strong>了我做了一些清理並刪除了請求和urllib3的所有包......然后我用<strong>apt</strong>重新安裝了所有東西,現在可以工作了。 requests 和 urllib3 的包版本分別為: <strong>2.12.4-1</strong>和<strong>1.19.1-1</strong></p></div>

[英]BeautifulSoup not show tag <a> in tag <div>

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Python BeautifulSoup無法讀取div標簽 BeautifulSoup如何索引標簽對象 如何從 BeautifulSoup Tag 對象中解開標簽? 帶有div標簽且沒有屬性的BeautifulSoup BeautifulSoup<a>在標簽中不顯示標簽</a><div></div><div id="text_translate"><p>在我做的一些測試中,我注意到<strong>div</strong>標簽內的標簽<strong>a</strong> , <strong>beautifulsoup</strong>自動將其“翻譯”為文本:</p><pre> &lt;div class='a'&gt; &lt;a href='....'&gt;TEXT&lt;/a&gt; &lt;i..... &lt;/div&gt;</pre><p> 當我使用命令find_all('div', {'class': 'a'})搜索<strong>div</strong>標簽並嘗試打印結果div.a時,bs4 顯示值<strong>None</strong> ...但如果我嘗試使用print div.text bs4 僅顯示<strong>TEXT</strong>而不是標簽<strong>a</strong> )。</p><p> 這是部分代碼:</p><pre> soup = BeautifulSoup(html, 'lxml') data=soup.find_all('div', {'class': 'a'}) for div in data: print div.a $ None</pre><p> 為什么?</p><p> <strong>更新</strong>:剛才我注意到......這是另一個問題。 在源代碼中有標簽<strong>a</strong> ...但是現在,(看到帶有美化的 output)我意識到標簽 bs4 讓我將其視為<strong>div</strong> ,而實際上它是標簽<strong>a</strong> ! 奇怪的!!!</p><p> 漏洞???</p><p> <strong>解決</strong>了我做了一些清理並刪除了請求和urllib3的所有包......然后我用<strong>apt</strong>重新安裝了所有東西,現在可以工作了。 requests 和 urllib3 的包版本分別為: <strong>2.12.4-1</strong>和<strong>1.19.1-1</strong></p></div> 如何將外部標簽添加到 BeautifulSoup 對象 在BeautifulSoup中訪問帶有標簽類型的字典 如何將BeautifulSoup對象保存到文件,然后以BeautifulSoup的形式從中讀取? Beautifulsoup多個div內容到字典 如何使用div標簽作為使用BeautifulSoup搜索html文檔的起點
 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM