简体   繁体   English

在beautifulsoup Python中的锚标记之间提取文本?

[英]Extracting text between anchor tag in beautifulsoup Python?

I am trying to extract the name of the movies listed on this fandango page. 我试图提取此fandango页上列出的电影的名称。

names_tag = soup.findAll('a', {'class': 'dark showtimes-movie-title'})

This is the anchor class the names are withheld in. The issue is, when I run the code, the output is: 这是保留名称的锚定类。问题是,当我运行代码时,输​​出为:

<a class="dark showtimes-movie-title" href="http://www.fandango.com/godzilla3d_170083/movieoverview">Godzilla 3D</a>

When all I want in Godzilla 3D. 当我在哥斯拉3D中想要的一切时。 How can I successfully parse this data? 如何成功解析此数据?

#anchor element containing the names of each movie
names_tag = soup.findAll('a', {'class': 'dark showtimes-movie-title'})
names_tag = str(names_tag)

movie_name = names_tag.split(',')

for each_line in movie_name:
    movie_names.append(each_line)

i = 0
while (i < len(movie_names)):

    print 'The length of %s is %s' %(movie_names[i], movie_times[i])

    i+=1

Use the text property: 使用text属性:

names_tag = soup.findAll('a', {'class': 'dark showtimes-movie-title'})
names = [name_tag.text for name_tag in names_tag]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM