简体   繁体   中英

How do i split this in python?

lets say that there was a table with the following contents:

<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>

and i want to make it like this on python:

>>>pets
[['Dog'],['Cat'],['Mouse'],['Snake'],['Dragon'],['Dinosaur'],['Lizard'],['Owl'],['Falcon'],['Phoenix']]

This is what i have managed so far.

animal = table.find_all('td')
pets = []
for i in animal:
    a = re.findall('[A-Z][a-z]*',str(i))
    pets.append(a)

however, i cant figure out a way to turn

['Dog','Cat','Mouse'] 

to

['Dog'],['Cat'],['Mouse'], 

and so on. please help. This is my first few days of programming and im already stuck. Thanks in advance.

import re
strs = """<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>"""

r = re.compile(r'<td>(.*?)</td>')
print [[x] for m in r.finditer(strs) for x in m.group(1).split(',')]

This prints:

[['Dog'], ['Cat'], ['Mouse'], ['Snake'], ['Dragon'], ['Dinosaur'], ['Lizard'], ['Owl'], ['Falcon'], ['Phoenix']]

And supports multiple <td>..</td> on the same line.

First, you should know that regex (regular expressions) are not always the best solution to parse some data. Here for instance, all your elements are separated by a , so the split method is the way to go.

As for putting your elements as arrays with a single element, list comprehension is the easiest way to do it. Again: make sure you really want/need to do this. It doesn't make much sense to have a set of lists with a single element.

Here's a suggested implementation:

elements = table.find_all('td')
pets = []
for e in elements:
    # The following line is only needed if 'find_all' keeps the <td> and </td>
    e_tagless = e[5:len(e)-5]

    animals = e_tagless.split(',')
    pets += [ [animal] for animal in animals ]

Try this:

>>> my_list = ['Dog','Cat','Mouse'] 
>>> map(lambda x: [x], my_list)
[['Dog'], ['Cat'], ['Mouse']]

Change this:

animal = table.find_all('td')
    pets = []
    for i in animal:
       a = re.findall('[A-Z][a-z]*',str(i))
       pets.append(a)

To this:

animal = table.find_all('td')
    pets = []
    for i in animal:
       a = re.findall('[A-Z][a-z]*',str(i))
       pets.append([a])

You were just missing the two characters [] when you were appending to mark up each item into it's own list during the loop iteration.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM