简体   繁体   中英

How to extract required information from a text? python

i want to extrect : tamar tamar,0529589055

from this text and i ahve to do that multiple times.

                    <h3 class="name">tamar tamar</h3>
                    <ul class="list-inline">
                        <li>gender:female</li>
                        <li>age:20</li>
                    <li class="phone" data="0529589055">phone:  0529589055</li>
                    <li class="email" data="tamar0529589055@gmail.com">email: tamar89055@gmail.com</li>         <!--                        <a 

did you think about trying to use regex? for example a simple (\\w+ \\w+)</h3> will extract the name. at least for the example above. for the number something like: (0\\d+)</li> from the top of my head.

an online regex site that i find easy to use: https://pythex.org

and python regex docs: https://docs.python.org/2/library/re.html

BeautifulSoup is what you are looking for

from bs4 import BeautifulSoup
a='''<h3 class="name">tamar tamar</h3>
<ul class="list-inline">
    <li>gender:female</li>
    <li>age:20</li>
<li class="phone" data="0529589055">phone:  0529589055</li>
<li class="email" data="tamar0529589055@gmail.com">email: tamar89055@gmail.com</li> 
'''
soup = BeautifulSoup(a)
print(soup.find('h3',{"class": "name"}).text)
print(soup.find('li',{"class":'phone'}).text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM