Python從文件中刪除元素

Question

這是我的代碼段：

from HTMLParser import HTMLParser
# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
        def handle_endtag(self, tag):
                if(tag == 'tr'):
                    textFile.write('\n')
        def handle_data(self, data):
                textFile.write(data+"\t")

textFile = open('instaQueryResult', 'w+')

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
fh = open('/data/aman/aggregate.html','r')
l = fh.readlines()
for line in l:
        parser.feed(line)

我解析一個HTML文件並獲得以下預期輸出：

plantype        count(distinct(SubscriberId))   sum(DownBytesNONE)      sum(UpBytesNONE)            sum(SessionCountNONE)
1006657 341175  36435436130     36472526498     694016
1013287 342280  36694005846     36533489363     697098
1006613 343867  36763692173     36755893252     699976
1014883 342436  36575951812     36572503611     695683
1003022 343238  36705838418     36637429353     698618
plantype        count(distinct(SubscriberId))   sum(DownBytesNONE)      sum(UpBytesNONE)            sum(SessionCountNONE)
1013287 342280  36694005846     36533489363     697098
1006657 341175  36435436130     36472526498     694016
1006613 343867  36763692173     36755893252     699976
1014883 342436  36575951812     36572503611     695683
1003022 343238  36705838418     36637429353     698618

此輸出是正確的，但我希望將標題刪除。 我的第一行包含要從文件中刪除的標頭，僅保留值。

預期產量：

1006657 341175  36435436130     36472526498     694016
1013287 342280  36694005846     36533489363     697098
1006613 343867  36763692173     36755893252     699976
1014883 342436  36575951812     36572503611     695683
1003022 343238  36705838418     36637429353     698618
1013287 342280  36694005846     36533489363     697098
1006657 341175  36435436130     36472526498     694016
1006613 343867  36763692173     36755893252     699976
1014883 342436  36575951812     36572503611     695683
1003022 343238  36705838418     36637429353     698618

Answer 1

由於您嘗試擺脫其中沒有數字的任何內容，因此可以嘗試將handle_data(self, data)方法修改為：

def handle_data(self, data):
    if data.isdigit():
        textFile.write(data+"\t")

Answer 2

我假設您的html數據具有以下形式：

<table>
    <tr>
        <td>plantype</td>
        <td>count(distinct(SubscriberId))</td>
        ...
    </tr>
    <tr>
        <td>1006657</td>
        <td>341175</td>
        ...
    </tr>
</table>

您可以使用row_count變量來檢查您是否在第一個tr-tag中。 使用handle_starttag將row_count設置為0。 在handle_endtag檢查它（並增加它）：

class MyHTMLParser(HTMLParser):
    row_count = 0
    def handle_starttag(self, tag, attrs):
        if (tag == 'table'):
            self.row_count = 0

    def handle_endtag(self, tag):
        if (tag == 'tr') and (self.row_count > 0):
            textFile.write('\n')
        self.row_count += 1

    def handle_data(self, tag):
        if self.row_count > 0:
            textFile.write(data+"\t")

Answer 3

嘗試這個：

fh = open('/data/aman/aggregate.html','r')
l = fh.readlines()
for line in l:
    if 'plantype' not in line:
        parser.feed(line)

您正在逐行讀取文件。 當您放置“ if'string of part'not in line”時，它僅對其他行（您想要的行）執行下一個塊。

Python從文件中刪除元素

問題描述

3 個解決方案

解決方案1
1 已采納 2014-03-18 08:23:20

解決方案2
0 2014-03-18 08:23:30

解決方案3
0 2014-03-18 08:31:57

Python從文件中刪除元素

問題描述

3 個解決方案

解決方案1 1 已采納 2014-03-18 08:23:20

解決方案2 0 2014-03-18 08:23:30

解決方案3 0 2014-03-18 08:31:57

解決方案1
1 已采納 2014-03-18 08:23:20

解決方案2
0 2014-03-18 08:23:30

解決方案3
0 2014-03-18 08:31:57