简体   繁体   中英

How do I remove certain elements from a list in python?

I'm scraping some data from a website but some of it is coming out with "\" in front of it. I tried to use this string of code but an error message occurred.

print([s.strip('\') for s in feet])    *EOL while scanning string literal
print([s.replace('\', ') for s in feet])

The code after the '\' in the first line became italicized, I have no clue what to do about this.

from lxml import html
import requests

list1 = []
height = []

user_website = "https://www.disabled-world.com/calculators-charts/height-weight.php"

page = requests.get(user_website)
tree = html.fromstring(page.content)
list2 = tree.xpath('//td/text()')

for x in list2:
    list_holder = x.split(" ")
    for i in list_holder:
        list1.append(i.lower())

subs = "'"
feet = [i for i in list2 if subs in i]

subs2 = '"'
inches = [i for i in list2 if subs in i]

print([s.strip('\') for s in feet])
print([s.replace('\', ') for s in feet])

y = 0

for x in feet:
    height.append(feet[y])
    height.append(inches[y])
    y+=1

print(height)

So I tried to extract your code, and this:

from lxml import html 
import requests 

list1 = [] 
height = [] 
user_website = "https://disabled-world.com/calculators-charts/height-weight.php" 
page = requests.get(user_website) 
tree = html.fromstring(page.content) 
list2 = tree.xpath('//td/text()')

for x in list2: 
    list_holder = x.split(" ") 
    for i in list_holder: 
        list1.append(i.lower()) 
subs = "'" 
feet = [i for i in list2 if subs in i] 
subs2 = '"' 
inches = [i for i in list2 if subs in i] 
print([s.strip('"') for s in feet]) 
#print([s.replace('\', ') for s in feet]) 
y = 0 

#for x in feet: 
#    height.append(feet[y]) 
#    height.append(inches[y]) 
#    y+=1 
#    print(height)

gives me the following output:

["4' 6", "4' 7", "4' 8", "4' 9", "4' 10", "4' 11", "5' 0", "5' 1", "5' 2", "5' 3", "5' 4", "5' 5", "5' 6", "5' 7", "5' 8", "5' 9", "5' 10", "5' 11", "6' 0", "6' 1", "6' 2", "6' 3", "6' 4", "6' 5", "6' 6", "6' 7", "6' 8", "6' 9", "6' 10", "6' 11", "7' 0"]

From your question, I assume this is what you want?

Anyway, the problem (as far as I could see) was simply wrong usage of the strip() function, which expects a string (containing the part of the source string you want to strip), not just a single character.

Until you post your code, it's hard to figure out the bug as to why your array is improperly formatted. However, you can use the following to fix this array:

#copy paste the data into a variable as a string using triple quotes (or convert this variable to a string)
a='''['4' 6"', '4' 6"', '4' 7"', '4' 7"', '4' 8"', '4' 8"', '4' 9"', '4' 9"', '4' 10"', '4' 10"', '4' 11"', '4' 11"', '5' 0"', '5' 0"', '5' 1"', '5' 1"', '5' 2"', '5' 2"', '5' 3"', '5' 3"', '5' 4"', '5' 4"', '5' 5"', '5' 5"', '5' 6"', '5' 6"', '5' 7"', '5' 7"', '5' 8"', '5' 8"', '5' 9"', '5' 9"', '5' 10"', '5' 10"', '5' 11"', '5' 11"', '6' 0"', '6' 0"', '6' 1"', '6' 1"', '6' 2"', '6' 2"', '6' 3"', '6' 3"', '6' 4"', '6' 4"', '6' 5"', '6' 5"', '6' 6"', '6' 6"', '6' 7"', '6' 7"', '6' 8"', '6' 8"', '6' 9"', '6' 9"', '6' 10"', '6' 10"', '6' 11"', '6' 11"', '7' 0"', '7' 0"']'''

m=a.strip('[').strip(']')  #remove braces
x=[]                       
n=m.split(',')             #creaet list of elements
for i in n:
    x.append(i.strip(" ").strip("'"))  #remove the excessive quotes and spaces
print(x)

This above code gives me x as:

['4\' 6"', '4\' 6"', '4\' 7"', '4\' 7"', '4\' 8"', '4\' 8"', '4\' 9"', '4\' 9"', '4\' 10"', '4\' 10"', '4\' 11"', '4\' 11"', '5\' 0"', '5\' 0"', '5\' 1"', '5\' 1"', '5\' 2"', '5\' 2"', '5\' 3"', '5\' 3"', '5\' 4"', '5\' 4"', '5\' 5"', '5\' 5"', '5\' 6"', '5\' 6"', '5\' 7"', '5\' 7"', '5\' 8"', '5\' 8"', '5\' 9"', '5\' 9"', '5\' 10"', '5\' 10"', '5\' 11"', '5\' 11"', '6\' 0"', '6\' 0"', '6\' 1"', '6\' 1"', '6\' 2"', '6\' 2"', '6\' 3"', '6\' 3"', '6\' 4"', '6\' 4"', '6\' 5"', '6\' 5"', '6\' 6"', '6\' 6"', '6\' 7"', '6\' 7"', '6\' 8"', '6\' 8"', '6\' 9"', '6\' 9"', '6\' 10"', '6\' 10"', '6\' 11"', '6\' 11"', '7\' 0"', '7\' 0"']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM