[英]How to remove elements from list when each character is identical (Python)?
[英]remove a specific character from each value of a list python
我有这个电影列表,我想删除点“。” 从每个标题。
我不能只删除每个值的第一个字符,因为并非所有值都以点“。”开头。
['Sueños de fuga(1994)',
'El padrino(1972)',
'Citizen Kane(1941)',
'12 hombres en pugna(1957)',
'La lista de Schindler(1993)',
'Lo bueno, lo malo y lo feo(1966)',
'El imperio contraataca(1980)',
'El señor de los anillos: El retorno del rey(2003)',
'Batman - El caballero de la noche(2008)',
'.El padrino II(1974)',
'.Tiempos violentos(1994)',
'.El club de la pelea(1999)',
'.Psicosis(1960)',
'.2001: Odisea del espacio(1968)',
'.Metropolis(1927)',
'.La guerra de las galaxias(1977)',
]
此外,该列表正在被废弃,因此仅手动删除该点是行不通的。
这是我到目前为止的代码:
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
# scrap movie names
scraped_movies = soup.find_all('h3', class_='lister-item-header')
# parse movie names
movies = []
for movie in scraped_movies:
movie = movie.get_text().replace('\n', "")
movie = movie.strip(" ")
movies.append(movie)
# remove the first two characters of each value on the list
movies = [e[2:] for e in movies]
# remove the remaining dots "."
while (movies.count(".")):
movies.remove(".")
# print list
print (movies)
尝试使用替换方法删除点
movie = movie.get_text().replace('\n', "").replace('.', "")
你可以试试这个:
# remove the remaining dots "."
for word in movies:
if word.startswith("."):
movies[movies.index(word)] = word.replace(".", "")
或者使用它,如果任何元素以点开头,它将查找并替换点,如果不是以点开头,它将忽略其他元素,并且当列表不包含以点开头的元素时它也可以工作。
# remove the remaining dots "."
movies = [word.replace(".", "") for word in movies if not all(word.startswith(".") for word in movies)]
编辑代码:
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
# scrap movie names
scraped_movies = soup.find_all('h3', class_='lister-item-header')
# parse movie names
movies = []
for movie in scraped_movies:
movie = movie.get_text().replace('\n', "")
movie = movie.strip(" ")
movies.append(movie)
# remove the first two characters of each value on the list
movies = [e[2:] for e in movies]
print(movies)
# remove the remaining dots "."
movies = [word.replace(".", "") for word in movies if not all(word.startswith(".") for word in movies)]
# print list
print (movies)
Output:
['The Shawshank Redemption(1994)', 'The Godfather(1972)', 'Citizen Kane(1941)', '12 Angry Men(1957)', "Schindler's List(1993)", 'Il buono, il brutto, il cattivo(1966)', 'The Empire Strikes Back(1980)', 'The Lord of the Rings: The Return of the King(2003)', 'The Dark Knight(2008)', '.The Godfather Part II(1974)', '.Pulp Fiction(1994)', '.Fight Club(1999)', '.Psycho(1960)', '.2001: A Space Odyssey(1968)', '.Metropolis(1927)', '.Star Wars(1977)', '.The Lord of the Rings: The Fellowship of the Ring(2001)', '.Terminator 2: Judgment Day(1991)', '.The Matrix(1999)', '.Raiders of the Lost Ark(1981)', '.Casablanca(1942)', '.The Wizard of Oz(1939)', '.Shichinin no samurai(1954)', '.Forrest Gump(1994)', '.Inception(2010)']
['The Shawshank Redemption(1994)', 'The Godfather(1972)', 'Citizen Kane(1941)', '12 Angry Men(1957)', "Schindler's List(1993)", 'Il buono, il brutto, il cattivo(1966)', 'The Empire Strikes Back(1980)', 'The Lord of the Rings: The Return of the King(2003)', 'The Dark Knight(2008)', 'The Godfather Part II(1974)', 'Pulp Fiction(1994)', 'Fight Club(1999)', 'Psycho(1960)', '2001: A Space Odyssey(1968)', 'Metropolis(1927)', 'Star Wars(1977)', 'The Lord of the Rings: The Fellowship of the Ring(2001)', 'Terminator 2: Judgment Day(1991)', 'The Matrix(1999)', 'Raiders of the Lost Ark(1981)', 'Casablanca(1942)', 'The Wizard of Oz(1939)', 'Shichinin no samurai(1954)', 'Forrest Gump(1994)', 'Inception(2010)']
对于列表理解,这应该是一件非常简单的事情。 如果您获取电影列表,则可以简单地将点替换为空。 此代码同时将您的电影的虚线开头和 append 替换到您的电影列表中。
movies = [x.replace('.', '') for x in scraped_movies]
Output:
['Sueños de fuga(1994)', 'El padrino(1972)', 'Citizen Kane(1941)', '12 hombres en pugna(1957)', 'La lista de Schindler(1993)', 'Lo bueno, lo malo y lo feo(1966)', 'El imperio contraataca(1980)', 'El señor de los anillos: El retorno del rey(2003)', 'Batman - El caballero de la noche(2008)', 'El padrino II(1974)', 'Tiempos violentos(1994)', 'El club de la pelea(1999)', 'Psicosis(1960)', '2001: Odisea del espacio(1968)', 'Metropolis(1927)', 'La guerra de las galaxias(1977)']
如果在某些情况下您担心点在标题中的其他位置而不是开头,那么您可以为string.startswith('.')
运行 if 语句以更准确地匹配。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.