简体   繁体   中英

Python:Want to remove the line that contain specific word

from lxml import html
import requests
import csv
page = requests.get('http://www.google.com/finance?q=[%28exchange+%3D%3D+%22ABC%22%29]&restype=company&noIL=1&start=0&num=1500')
tree = html.fromstring(page.content)

#Scrape stocks companies and symbols

stocks = tree.xpath('//a [not(@class)][@id][@href]/text()')
#This will create a list of prices
stocks.remove('IRM Group Berhad');
stocks.remove('A & M Realty Berhad');
stocks.remove('BERJAYA FOOD BERHAD- A SHARES');


print 'Stocks= ', stocks

# open a file for writing.
csv_out = open('KLSE.csv', 'wb')

mywriter = csv.writer(csv_out)

rows = zip(stocks)
mywriter.writerows(rows)

csv_out.close()

I would like to remove all the lines that contain the word 'Berhad' as I do not want to remove it one by one. Any clue how to do it?

您可以这样操作:

stocks = [s for s in stocks if 'berhad' not in s.lower()]

Assuming that stocks is just a usual list you could try something like

trimmed_stocks = [ x for x in stocks if not 'Berhad' in x ]

It's not clear from your post if, say, BERHAD or bErHaD should be excluded as well, but these could be handled similarly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM