繁体   English   中英

从列表 python 的每个值中删除特定字符

[英]remove a specific character from each value of a list python

我有这个电影列表,我想删除点“。” 从每个标题。

我不能只删除每个值的第一个字符,因为并非所有值都以点“。”开头。

   ['Sueños de fuga(1994)',
     'El padrino(1972)',
     'Citizen Kane(1941)',
     '12 hombres en pugna(1957)',
     'La lista de Schindler(1993)',
     'Lo bueno, lo malo y lo feo(1966)',
     'El imperio contraataca(1980)',
     'El señor de los anillos: El retorno del rey(2003)',
     'Batman - El caballero de la noche(2008)',
     '.El padrino II(1974)',
     '.Tiempos violentos(1994)',
     '.El club de la pelea(1999)',
     '.Psicosis(1960)',
    '.2001: Odisea del espacio(1968)',
    '.Metropolis(1927)',
    '.La guerra de las galaxias(1977)',
     ]

此外,该列表正在被废弃,因此仅手动删除该点是行不通的。

这是我到目前为止的代码:

from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
# scrap movie names
scraped_movies = soup.find_all('h3', class_='lister-item-header')

# parse movie names
movies = []
for movie in scraped_movies:
    movie = movie.get_text().replace('\n', "")
    movie = movie.strip(" ")
    movies.append(movie)

# remove the first two characters of each value on the list
movies = [e[2:] for e in movies]  

# remove the remaining dots "."
while (movies.count(".")):
    movies.remove(".")

# print list
print (movies)

尝试使用替换方法删除点

movie = movie.get_text().replace('\n', "").replace('.', "")

你可以试试这个:

# remove the remaining dots "."
for word in movies:
    if word.startswith("."):
        movies[movies.index(word)] = word.replace(".", "")

或者使用它,如果任何元素以点开头,它将查找并替换点,如果不是以点开头,它将忽略其他元素,并且当列表不包含以点开头的元素时它也可以工作。

# remove the remaining dots "."    
movies = [word.replace(".", "") for word in movies if not all(word.startswith(".") for word in movies)]

编辑代码:

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://www.imdb.com/list/ls024149810/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")

# scrap movie names
scraped_movies = soup.find_all('h3', class_='lister-item-header')

# parse movie names
movies = []
for movie in scraped_movies:
    movie = movie.get_text().replace('\n', "")
    movie = movie.strip(" ")
    movies.append(movie)

# remove the first two characters of each value on the list
movies = [e[2:] for e in movies]
print(movies)

# remove the remaining dots "."
movies = [word.replace(".", "") for word in movies if not all(word.startswith(".") for word in movies)]

# print list
print (movies)

Output:

['The Shawshank Redemption(1994)', 'The Godfather(1972)', 'Citizen Kane(1941)', '12 Angry Men(1957)', "Schindler's List(1993)", 'Il buono, il brutto, il cattivo(1966)', 'The Empire Strikes Back(1980)', 'The Lord of the Rings: The Return of the King(2003)', 'The Dark Knight(2008)', '.The Godfather Part II(1974)', '.Pulp Fiction(1994)', '.Fight Club(1999)', '.Psycho(1960)', '.2001: A Space Odyssey(1968)', '.Metropolis(1927)', '.Star Wars(1977)', '.The Lord of the Rings: The Fellowship of the Ring(2001)', '.Terminator 2: Judgment Day(1991)', '.The Matrix(1999)', '.Raiders of the Lost Ark(1981)', '.Casablanca(1942)', '.The Wizard of Oz(1939)', '.Shichinin no samurai(1954)', '.Forrest Gump(1994)', '.Inception(2010)']
['The Shawshank Redemption(1994)', 'The Godfather(1972)', 'Citizen Kane(1941)', '12 Angry Men(1957)', "Schindler's List(1993)", 'Il buono, il brutto, il cattivo(1966)', 'The Empire Strikes Back(1980)', 'The Lord of the Rings: The Return of the King(2003)', 'The Dark Knight(2008)', 'The Godfather Part II(1974)', 'Pulp Fiction(1994)', 'Fight Club(1999)', 'Psycho(1960)', '2001: A Space Odyssey(1968)', 'Metropolis(1927)', 'Star Wars(1977)', 'The Lord of the Rings: The Fellowship of the Ring(2001)', 'Terminator 2: Judgment Day(1991)', 'The Matrix(1999)', 'Raiders of the Lost Ark(1981)', 'Casablanca(1942)', 'The Wizard of Oz(1939)', 'Shichinin no samurai(1954)', 'Forrest Gump(1994)', 'Inception(2010)']

对于列表理解,这应该是一件非常简单的事情。 如果您获取电影列表,则可以简单地将点替换为空。 此代码同时将您的电影的虚线开头和 append 替换到您的电影列表中。

movies = [x.replace('.', '') for x in scraped_movies]

Output:

['Sueños de fuga(1994)', 'El padrino(1972)', 'Citizen Kane(1941)', '12 hombres en pugna(1957)', 'La lista de Schindler(1993)', 'Lo bueno, lo malo y lo feo(1966)', 'El imperio contraataca(1980)', 'El señor de los anillos: El retorno del rey(2003)', 'Batman - El caballero de la noche(2008)', 'El padrino II(1974)', 'Tiempos violentos(1994)', 'El club de la pelea(1999)', 'Psicosis(1960)', '2001: Odisea del espacio(1968)', 'Metropolis(1927)', 'La guerra de las galaxias(1977)']

如果在某些情况下您担心点在标题中的其他位置而不是开头,那么您可以为string.startswith('.')运行 if 语句以更准确地匹配。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM