[英]Python regex ignore new line
I have web page look like this 我的网页看起来像这样
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>
I would like to get certain values from this site using python regex. 我想使用python regex从此站点获取某些值。 After <div align="center">
I like to get href value: "/title/name.php" and img src: "./movie/image.jpg" and Title - secondname from <h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1>
在<div align="center">
我想获取href值:“ /title/name.php”和img src:“ ./movie/image.jpg”和Title-来自<h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1>
第二名<h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1>
i have tried this: regex = 'class="main_tb3"*\\n<a href="(.+?)" target="_blank">\\n<img src="(.+?)"'
我已经尝试过: regex = 'class="main_tb3"*\\n<a href="(.+?)" target="_blank">\\n<img src="(.+?)"'
please help me 请帮我
you can use below regex 您可以在正则表达式下面使用
For href value:
<a href="(.*?)"
对于href值:<a href="(.*?)"
For Image src:
<img src="(.*?)"
对于图片src:<img src="(.*?)"
For Title:
titleid=12">(.*?)<
对于标题:titleid=12">(.*?)<
You will find it a lot simpler to install something like BeautifulSoup
to do this: 您会发现安装类似BeautifulSoup
这样的东西要简单得多:
from bs4 import BeautifulSoup
html = """
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center"><a href="./title.php?titleid=12">Title - secondname</a></h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i><a href="./secondname.php" target="_blank">secondname</a></i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>"""
soup = BeautifulSoup(html)
for table in soup.find_all("table", class_="main_tb3"):
print table.find('a').get('href')
print table.find('h1').text
For the HTML you have given, this will print the following: 对于您提供的HTML,将打印以下内容:
/title/name.php
Title - secondname
/title/name.php
Title - secondname
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.