簡體   English   中英

如何在Python循環中刪除具有特定單詞的行?

[英]How to delete a row with a specific word in a loop with Python?

我想刪除包含“計划的”的所有行,並將其導出到新的csv文件。 我的代碼有什么問題? 我沒有任何錯誤消息,它運行沒有問題,但是什么也沒發生。

這是我的代碼:

def scrape(urls):
    browser = webdriver.Firefox()
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        soup2=BeautifulSoup(html,"html.parser")
        name = soup2.h2.string
        datatable=[]
        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            temp_data.append(name)    
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            newlist = filter(None, temp_data)
            datatable.append(newlist)
        print name
        output.writerows(datatable)

        def filter_unwanted_words():
            unwanted_words = {'Scheduled'}
            with open('output.csv', 'r') as f:
                for line in f:
                    if set(line.split()).isdisjoint(unwanted_words):
                        yield line


        def write_output():
            with open('output2.csv', 'w') as f:
                f.writelines((line for line in filter_unwanted_words()))

        if __name__ == '__main__':
            write_output()

    resultcsv.close()
    time.sleep(10) 
    browser.close()

我嘗試了這個def filter_unwanted_words,但是它不起作用。

數據表: 圖片

替代解決方案。 考慮將其讀入帶有Pandas的數據框。

import pandas as pd

data = [[123,1,"Scheduled"],[345,2,"-"]]

df = pd.DataFrame(data)
df[df[2] != "Scheduled"] # filters with 2 being the column that has the value
df.to_csv("output.csv", header=False) # no headers

數據框如下所示:

    0       1   2
0   123     1   Scheduled
1   345     2   -

數據看起來像這樣,“計划的”過濾掉了:

    0       1   2
1   345     2   -

一種更通用的解決方案 ,可以過濾掉所有“計划的”而不管它們在哪里:

import pandas as pd

data = [[123,1,"Scheduled"],[345,2,"-"]]

df = pd.DataFrame(data)
mask = np.column_stack([df[col].astype(str).str.contains(r"Scheduled", na=False) for col in df])
df2 = df.loc[~mask.any(axis=1)]
df2.to_csv("output.csv", header=False) # no headers

也許您必須將if __name__ == "__main__"放在函數scrape之外。 像這樣:

def scrape(urls):
    browser = webdriver.Firefox()
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        soup2=BeautifulSoup(html,"html.parser")
        name = soup2.h2.string
        datatable=[]
        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            temp_data.append(name)    
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            newlist = filter(None, temp_data)
            datatable.append(newlist)
        print name
        output.writerows(datatable)

    resultcsv.close()
    time.sleep(10) 
    browser.close()

def filter_unwanted_words():
    unwanted_words = {'Scheduled'}
    with open('output.csv', 'r') as f:
         for line in f:
             if set(line.split()).isdisjoint(unwanted_words):
                 yield line


def write_output():
    with open('output2.csv', 'w') as f:
         f.writelines((line for line in filter_unwanted_words()))

if __name__ == '__main__':
     write_output()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM