HTML解析-在所有标签之间获取文本

Question

I want to get the text between all the tags in a specific tr. 我想获取特定tr中所有标签之间的文本。 i have looked at similar questions but they are specific for a tag type. 我看过类似的问题，但它们特定于标签类型。

If I do something like this : 如果我做这样的事情：

for strong_tag in soup.find_all('strong'):
    print strong_tag.text

That is for a particular tag, but how to do it for the complete tr.? 那是一个特定的标签，但是如何在完整的tr。中做呢？

<tr>
   <td style="border:0px solid black;padding: 0px 5.4pt;border-color: currentColor windowtext windowtext;border-style: none solid solid;border-width: medium 0pt 0pt;background: white;" width="39">
      <p align="center" style="min-height: 8pt; padding: 0px; text-align: center;"> </p>
   </td>
   <td colspan="7" style="border:0px solid black;vertical-align: top;text-align: left;padding: 0px 5.4pt;border-color: currentColor windowtext windowtext currentColor;border-style: none solid solid none;border-width: medium 0pt 0pt medium;background: white;" width="683">
      <ol style="list-style-type: decimal;">
         <li>Process the return per standard procedures. Refer to the <a class="jive-link-wiki-small" data-containerid="2456" data-containertype="14" data-objectid="12425" data-objecttype="102" href="https://iconnect.sprint.com/docs/DOC-12425">Sprint Satisfaction Guarantee Procedure</a> for steps.</li>
         <li>RMS will reset the eligibility when doing a <strong>Sprint Monthly Installments Return</strong>. If the original transaction was performed in RMS, the system will display a message and advise that a history transaction can be performed or you can proceed with a No History Return</li>
         <li>
            To reset Monthly Installments upgrade eligibility and process the return:
            <ol>
               <li>Return the device.</li>
               <li>Re-access the account to see if the line is still <strong>upgrade-eligible for Monthly Installments</strong>.</li>
            </ol>
            <ul>
               <ul>
                  <li><strong>If so,</strong> proceed with the sale as normal.</li>
                  <li>
                     If the customer's line is showing as <strong>not upgrade-eligible</strong> for Monthly Installments:
                     <ol>
                        <li>Add a note to the customer's account stating the return transaction number and the need for eligibility reset.</li>
                        <li>Reset the customer's eligibility by using the MSA tablet or through iCare <em><strong>or</strong></em></li>
                        <li>Contact <strong>NSS</strong> to request an eligibility reset <strong>only</strong> if the reset was <strong>not successful</strong>.<strong> </strong></li>
                     </ol>
                  </li>
               </ul>
               <ul>
                  <li><span style="font-family: Arial;">Once eligibility is reset, pull up the customer's account again in RMS and process the sale.</span></li>
               </ul>
            </ul>
         </li>
      </ol>
   </td>
</tr>

The output expected is : Text between all tags 预期的输出是：所有标签之间的文本

Answer 1

get_text() gets all the child strings and return concatenated using the given separator get_text()获取所有子字符串，并使用给定的分隔符串联返回

text is a property to the get_text method - Undocumented text是get_text方法的属性-未记录

print(soup.select('tr')[0].text)

With Alignments 与路线

import bs4
soup=bs4.BeautifulSoup(open('h.html'),'lxml')
def get_text(i):
   r=[]
   for t in i.contents:
      if type(t)==bs4.element.NavigableString:r.append(t.strip())
      elif t.name in ['strong','span'] :r.append(t.text.strip())
   return ' '.join(r)


s=soup.select('li',)
for i in s:
   level=(len(i.find_parents('ol')+i.find_parents('ul')))-1
   print(' '*level*5,get_text(i))
   print('-'*50)

HTML解析-在所有标签之间获取文本

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-05-14 06:17:07

HTML解析-在所有标签之间获取文本

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-05-14 06:17:07

解决方案1
1 已采纳 2018-05-14 06:17:07