[英]How to delete a row with a specific word in a loop with Python?
我想刪除包含“計划的”的所有行,並將其導出到新的csv文件。 我的代碼有什么問題? 我沒有任何錯誤消息,它運行沒有問題,但是什么也沒發生。
這是我的代碼:
def scrape(urls):
browser = webdriver.Firefox()
for url in urls:
browser.get(url)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
soup2=BeautifulSoup(html,"html.parser")
name = soup2.h2.string
datatable=[]
for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
temp_data = []
temp_data.append(name)
for data in record.find_all("td"):
temp_data.append(data.text.encode('latin-1'))
newlist = filter(None, temp_data)
datatable.append(newlist)
print name
output.writerows(datatable)
def filter_unwanted_words():
unwanted_words = {'Scheduled'}
with open('output.csv', 'r') as f:
for line in f:
if set(line.split()).isdisjoint(unwanted_words):
yield line
def write_output():
with open('output2.csv', 'w') as f:
f.writelines((line for line in filter_unwanted_words()))
if __name__ == '__main__':
write_output()
resultcsv.close()
time.sleep(10)
browser.close()
我嘗試了這個def filter_unwanted_words,但是它不起作用。
替代解決方案。 考慮將其讀入帶有Pandas的數據框。
import pandas as pd
data = [[123,1,"Scheduled"],[345,2,"-"]]
df = pd.DataFrame(data)
df[df[2] != "Scheduled"] # filters with 2 being the column that has the value
df.to_csv("output.csv", header=False) # no headers
數據框如下所示:
0 1 2
0 123 1 Scheduled
1 345 2 -
數據看起來像這樣,“計划的”過濾掉了:
0 1 2
1 345 2 -
一種更通用的解決方案 ,可以過濾掉所有“計划的”而不管它們在哪里:
import pandas as pd
data = [[123,1,"Scheduled"],[345,2,"-"]]
df = pd.DataFrame(data)
mask = np.column_stack([df[col].astype(str).str.contains(r"Scheduled", na=False) for col in df])
df2 = df.loc[~mask.any(axis=1)]
df2.to_csv("output.csv", header=False) # no headers
也許您必須將if __name__ == "__main__"
放在函數scrape
之外。 像這樣:
def scrape(urls):
browser = webdriver.Firefox()
for url in urls:
browser.get(url)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
soup2=BeautifulSoup(html,"html.parser")
name = soup2.h2.string
datatable=[]
for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
temp_data = []
temp_data.append(name)
for data in record.find_all("td"):
temp_data.append(data.text.encode('latin-1'))
newlist = filter(None, temp_data)
datatable.append(newlist)
print name
output.writerows(datatable)
resultcsv.close()
time.sleep(10)
browser.close()
def filter_unwanted_words():
unwanted_words = {'Scheduled'}
with open('output.csv', 'r') as f:
for line in f:
if set(line.split()).isdisjoint(unwanted_words):
yield line
def write_output():
with open('output2.csv', 'w') as f:
f.writelines((line for line in filter_unwanted_words()))
if __name__ == '__main__':
write_output()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.