使用Beautiful Soup解析特定数据

Question

So I have a webpage which has tabular data in it. 所以我有一个网页，其中包含表格数据。 The following is the HTML code for the table: 以下是该表的HTML代码：

    <table class="confluenceTable">
    <tbody>
       <tr>
          <th class="confluenceTh">
             <p>Prefix</p>
          </th>
          <th class="confluenceTh">
             <p>Group</p>
          </th>
          <th class="confluenceTh">
             <p>Contact</p>
          </th>
          <th class="confluenceTh">
             <p>Dev/Test Lab</p>
          </th>
          <th class="confluenceTh">
             <p>Performance</p>
          </th>
       </tr>
       <tr>
          <td class="confluenceTd">
             <p> </p>
          </td>
          <td class="confluenceTd">
             <p> </p>
          </td>
          <td class="confluenceTd">
             <p> </p>
          </td>
       </tr>
       <tr>
          <th class="confluenceTh">
             <p> </p>
          </th>
          <th class="confluenceTh">
             <p> </p>
          </th>
          <th class="confluenceTh">
             <p> </p>
          </th>
       </tr>
       <tr>
          <td class="confluenceTd">
             <p>SEF00</p>
          </td>
          <td class="confluenceTd">
             <p>APTRA Vision</p>
          </td>
          <td class="confluenceTd">
             <p> </p>
          </td>
          <td class="confluenceTd">
             <p><a href="/somepage">VCD Lab</a> , <a href="/somepage">Test Lab</a></p>
          </td>
          <td class="confluenceTd">
             <p><a href="/display">Perf Lab</a></p>
          </td>
       </tr>
       <tr>
          <td class="confluenceTd">
             <p>SEF01</p>
          </td>
          <td class="confluenceTd">
             <p>In-Person Bill Payment</p>
          </td>
          <td class="confluenceTd">
             <p>Swamy PKV</p>
          </td>

How can I format my Python code so that I just get all data underneath Prefix and Group columns. 如何格式化我的Python代码，以便仅将所有数据保存在Prefix和Group列下面。 So far I have tried this: 到目前为止，我已经尝试过了：

ii=1
data=requests.get(url,auth=(username,password))
sample=data.content
soup=BeautifulSoup(sample,'html.parser')
for row in soup.find_all('tr')[1:154]:
     datatocheck.append(row.get_text(separator='\t'))
while(ii<=152):
        print datatocheck[ii][0:30]
        ii+=1

This gives me the following output: 这给了我以下输出：

SEF00   APTRA Vision        VCD Lab  
SEF01   In-Person Bill Payment  S

But I just want SEF00 (prefix) and APTRA Vision (group), SEF01 and In-Person Bill Payment . 但是我只想要SEF00 （前缀）和APTRA Vision （小组）， SEF01和In-Person Bill Payment 。 Not the other columns. 没有其他列。

Also, I cant change my HTML code. 另外，我无法更改HTML代码。

Answer 1

How about if u do If SEF00 in ii: 如果您在ii中使用SEF00怎么办：

It may print just the SEF00 它可能只打印SEF00

Answer 2

soup = BeautifulSoup(html, 'lxml')

for row in soup.find_all('tr')[3:]:   # remove empty row
    tds = [i.get_text(strip=True) for i in row.find_all('td')]
    print(tds[0],tds[1])

out: 出：

SEF00 APTRA Vision
SEF01 In-Person Bill Payment

just get all the td in the row, put them in a list, than slice it 只需将所有td放入行中，然后将它们放入列表中，然后将其切成薄片即可

使用Beautiful Soup解析特定数据

问题描述

2 个解决方案

解决方案1
0 2016-11-18 13:53:22

解决方案2
0 2016-11-18 15:22:04

使用Beautiful Soup解析特定数据

问题描述

2 个解决方案

解决方案1 0 2016-11-18 13:53:22

解决方案2 0 2016-11-18 15:22:04

解决方案1
0 2016-11-18 13:53:22

解决方案2
0 2016-11-18 15:22:04