如何在Python循環中刪除具有特定單詞的行？

Question

我想刪除包含“計划的”的所有行，並將其導出到新的csv文件。 我的代碼有什么問題？ 我沒有任何錯誤消息，它運行沒有問題，但是什么也沒發生。

這是我的代碼：

def scrape(urls):
    browser = webdriver.Firefox()
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        soup2=BeautifulSoup(html,"html.parser")
        name = soup2.h2.string
        datatable=[]
        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            temp_data.append(name)    
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            newlist = filter(None, temp_data)
            datatable.append(newlist)
        print name
        output.writerows(datatable)

        def filter_unwanted_words():
            unwanted_words = {'Scheduled'}
            with open('output.csv', 'r') as f:
                for line in f:
                    if set(line.split()).isdisjoint(unwanted_words):
                        yield line


        def write_output():
            with open('output2.csv', 'w') as f:
                f.writelines((line for line in filter_unwanted_words()))

        if __name__ == '__main__':
            write_output()

    resultcsv.close()
    time.sleep(10) 
    browser.close()

我嘗試了這個def filter_unwanted_words，但是它不起作用。

數據表：

Answer 1

替代解決方案。 考慮將其讀入帶有Pandas的數據框。

import pandas as pd

data = [[123,1,"Scheduled"],[345,2,"-"]]

df = pd.DataFrame(data)
df[df[2] != "Scheduled"] # filters with 2 being the column that has the value
df.to_csv("output.csv", header=False) # no headers

數據框如下所示：

    0       1   2
0   123     1   Scheduled
1   345     2   -

數據看起來像這樣，“計划的”過濾掉了：

    0       1   2
1   345     2   -

一種更通用的解決方案 ，可以過濾掉所有“計划的”而不管它們在哪里：

import pandas as pd

data = [[123,1,"Scheduled"],[345,2,"-"]]

df = pd.DataFrame(data)
mask = np.column_stack([df[col].astype(str).str.contains(r"Scheduled", na=False) for col in df])
df2 = df.loc[~mask.any(axis=1)]
df2.to_csv("output.csv", header=False) # no headers

Answer 2

也許您必須將if __name__ == "__main__"放在函數scrape之外。 像這樣：

def scrape(urls):
    browser = webdriver.Firefox()
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        soup2=BeautifulSoup(html,"html.parser")
        name = soup2.h2.string
        datatable=[]
        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            temp_data.append(name)    
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            newlist = filter(None, temp_data)
            datatable.append(newlist)
        print name
        output.writerows(datatable)

    resultcsv.close()
    time.sleep(10) 
    browser.close()

def filter_unwanted_words():
    unwanted_words = {'Scheduled'}
    with open('output.csv', 'r') as f:
         for line in f:
             if set(line.split()).isdisjoint(unwanted_words):
                 yield line


def write_output():
    with open('output2.csv', 'w') as f:
         f.writelines((line for line in filter_unwanted_words()))

if __name__ == '__main__':
     write_output()

如何在Python循環中刪除具有特定單詞的行？

問題描述

2 個解決方案

解決方案1
2 已采納 2017-08-07 07:27:42

解決方案2
0 2017-08-07 07:26:18

如何在Python循環中刪除具有特定單詞的行？

問題描述

2 個解決方案

解決方案1 2 已采納 2017-08-07 07:27:42

解決方案2 0 2017-08-07 07:26:18

解決方案1
2 已采納 2017-08-07 07:27:42

解決方案2
0 2017-08-07 07:26:18