简体   繁体   English

如何在python中拆分?

[英]How do i split this in python?

lets say that there was a table with the following contents: 可以说有一个包含以下内容的表:

<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>

and i want to make it like this on python: 我想在python上做到这一点:

>>>pets
[['Dog'],['Cat'],['Mouse'],['Snake'],['Dragon'],['Dinosaur'],['Lizard'],['Owl'],['Falcon'],['Phoenix']]

This is what i have managed so far. 这是我到目前为止所完成的。

animal = table.find_all('td')
pets = []
for i in animal:
    a = re.findall('[A-Z][a-z]*',str(i))
    pets.append(a)

however, i cant figure out a way to turn 但是,我找不到办法

['Dog','Cat','Mouse'] 

to

['Dog'],['Cat'],['Mouse'], 

and so on. 等等。 please help. 请帮忙。 This is my first few days of programming and im already stuck. 这是我编程的前几天,即时通讯已经停滞了。 Thanks in advance. 提前致谢。

import re
strs = """<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>"""

r = re.compile(r'<td>(.*?)</td>')
print [[x] for m in r.finditer(strs) for x in m.group(1).split(',')]

This prints: 打印:

[['Dog'], ['Cat'], ['Mouse'], ['Snake'], ['Dragon'], ['Dinosaur'], ['Lizard'], ['Owl'], ['Falcon'], ['Phoenix']]

And supports multiple <td>..</td> on the same line. 并在同一行上支持多个<td>..</td>

First, you should know that regex (regular expressions) are not always the best solution to parse some data. 首先,您应该知道regex (正则表达式)并非始终是解析某些数据的最佳解决方案。 Here for instance, all your elements are separated by a , so the split method is the way to go. 例如,在这里,所有元素都由分隔,因此split方法是解决之道。

As for putting your elements as arrays with a single element, list comprehension is the easiest way to do it. 至于将元素作为具有单个元素的数组放置,列表理解是最简单的方法。 Again: make sure you really want/need to do this. 再次:确保您确实想要/需要这样做。 It doesn't make much sense to have a set of lists with a single element. 拥有一个包含单个元素的列表并没有多大意义。

Here's a suggested implementation: 这是建议的实现:

elements = table.find_all('td')
pets = []
for e in elements:
    # The following line is only needed if 'find_all' keeps the <td> and </td>
    e_tagless = e[5:len(e)-5]

    animals = e_tagless.split(',')
    pets += [ [animal] for animal in animals ]

Try this: 尝试这个:

>>> my_list = ['Dog','Cat','Mouse'] 
>>> map(lambda x: [x], my_list)
[['Dog'], ['Cat'], ['Mouse']]

Change this: 更改此:

animal = table.find_all('td')
    pets = []
    for i in animal:
       a = re.findall('[A-Z][a-z]*',str(i))
       pets.append(a)

To this: 对此:

animal = table.find_all('td')
    pets = []
    for i in animal:
       a = re.findall('[A-Z][a-z]*',str(i))
       pets.append([a])

You were just missing the two characters [] when you were appending to mark up each item into it's own list during the loop iteration. 在循环迭代过程中追加将每个项目标记到其自己的列表中时,您只是缺少了两个字符[]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM