I am a newbie to python
. I want to use BeautifulSoup
to get the post date in a forum. I tried many ways but unable to get the correct result.
Here is my problem:
<td class = by>
<cite>...</cite>
<em>
<span>2015-11-13</span>
</em>
</td>
<td class = ...>...</td>
<td class = by>...</td>
<cite>...</cite>
<em><a>...</a></em>
</td>
There are 2 classes with the same name " by
" but I only want the date in the first with " span
" tag.
Here is what I have tried but have no idea what's the problem:
cat=1
for span in soup.findAll('span', {'class':"by"}):
print (span.text)
A generic solution could be to iterate over <td>
of class='by'
and find <span>
. from bs4 import BeautifulSoup
a="""<td class = by>
<cite>...</cite>
<em>
<span>2015-11-13</span>
</em>
</td>
<td class = ...>...</td>
<td class = by>...</td>
<cite>...</cite>
<em><a>...</a></em>
</td>"""
soup = BeautifulSoup(a, 'html.parser')
for item in soup.find_all("td",{"class": "by"}):
for i in item.find_all("span"):
print(i.text) # 2015-11-13
A more straightforward approach is
soup.select('td.by > em > span')[0].text # 2015-11-13
If you are only concerned with the first occurrence then as suggested by @Jon Clements you can use
soup.select_one('td.by > em > span').text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.