[英]How do i split this in python?
lets say that there was a table with the following contents: 可以说有一个包含以下内容的表:
<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>
and i want to make it like this on python: 我想在python上做到这一点:
>>>pets
[['Dog'],['Cat'],['Mouse'],['Snake'],['Dragon'],['Dinosaur'],['Lizard'],['Owl'],['Falcon'],['Phoenix']]
This is what i have managed so far. 这是我到目前为止所完成的。
animal = table.find_all('td')
pets = []
for i in animal:
a = re.findall('[A-Z][a-z]*',str(i))
pets.append(a)
however, i cant figure out a way to turn 但是,我找不到办法
['Dog','Cat','Mouse']
to 至
['Dog'],['Cat'],['Mouse'],
and so on. 等等。 please help.
请帮忙。 This is my first few days of programming and im already stuck.
这是我编程的前几天,即时通讯已经停滞了。 Thanks in advance.
提前致谢。
import re
strs = """<td>Dog,Cat,Mouse</td>
<td>Snake,Dragon,Dinosaur,Lizard</td>
<td>Owl,Falcon,Phoenix</td>"""
r = re.compile(r'<td>(.*?)</td>')
print [[x] for m in r.finditer(strs) for x in m.group(1).split(',')]
This prints: 打印:
[['Dog'], ['Cat'], ['Mouse'], ['Snake'], ['Dragon'], ['Dinosaur'], ['Lizard'], ['Owl'], ['Falcon'], ['Phoenix']]
And supports multiple <td>..</td>
on the same line. 并在同一行上支持多个
<td>..</td>
。
First, you should know that regex
(regular expressions) are not always the best solution to parse some data. 首先,您应该知道
regex
(正则表达式)并非始终是解析某些数据的最佳解决方案。 Here for instance, all your elements are separated by a ,
so the split
method is the way to go. 例如,在这里,所有元素都由分隔
,
因此split
方法是解决之道。
As for putting your elements as arrays with a single element, list comprehension is the easiest way to do it. 至于将元素作为具有单个元素的数组放置,列表理解是最简单的方法。 Again: make sure you really want/need to do this.
再次:确保您确实想要/需要这样做。 It doesn't make much sense to have a set of lists with a single element.
拥有一个包含单个元素的列表并没有多大意义。
Here's a suggested implementation: 这是建议的实现:
elements = table.find_all('td')
pets = []
for e in elements:
# The following line is only needed if 'find_all' keeps the <td> and </td>
e_tagless = e[5:len(e)-5]
animals = e_tagless.split(',')
pets += [ [animal] for animal in animals ]
Try this: 尝试这个:
>>> my_list = ['Dog','Cat','Mouse']
>>> map(lambda x: [x], my_list)
[['Dog'], ['Cat'], ['Mouse']]
Change this: 更改此:
animal = table.find_all('td')
pets = []
for i in animal:
a = re.findall('[A-Z][a-z]*',str(i))
pets.append(a)
To this: 对此:
animal = table.find_all('td')
pets = []
for i in animal:
a = re.findall('[A-Z][a-z]*',str(i))
pets.append([a])
You were just missing the two characters []
when you were appending to mark up each item into it's own list during the loop iteration. 在循环迭代过程中追加将每个项目标记到其自己的列表中时,您只是缺少了两个字符
[]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.