簡體   English   中英

不使用python csv模塊將csv行(從for循環內)寫出到csv文件

[英]write csv row (from within for loop) out to csv file without using python csv module

**我的目標是避免導入csv模塊

我正在研究一個腳本,該腳本通過一個非常大的csv文件運行,並有選擇地將行寫入新的csv文件。

我有以下兩行:

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile: 

然后,這是一些嵌套的if語句:

line = list(ifile)[row]
ofile.write(line)

我知道這是不對的-我對此a之以鼻,希望這里的人能為如何正確解決此問題提供一些啟示。 這個問題的實質是如何引用我所在的行,以便可以使用“ ofile”將其寫到新的csv文件中。 如果需要進一步說明,請告訴我。 謝謝!

編輯:pastebin鏈接中包含完整代碼-http: //pastebin.com/a0jx85xR

你很親密 這就是您要做的一切:

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile:

    #...
    #You've defined some_condition to be met (you will have to replace this for yourself)
    #E.g.: the number of entries in each row is greater than 5:
        if len([term for term in row.split('#') if term.strip() != '']) > 5:
            ofile.write(row)

更新:

要回答OP關於分割線的問題:

您可以通過提供定界字符在Python中拆分一行。 由於這是一個CSV文件,分割得到的線, 例:

如果這是一行(字符串):

0, 1, 2, 3, 4, 5

如果您申請:

line.split(',')

您將獲得一個列表

['0', '1', '2', '3', '4', '5']

更新2:

import sys

if __name__ == '__main__':
    ticker = sys.argv[3]
    allTypes = bool(int(sys.argv[4])) #argv[4] is a string, you have to convert it to an int, then to a bool

    with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
        all_timestamps = [] #this is an empty list
        n_rows = 0
        for row in ifile:
            #This splits the line into constituent terms as described earlier
            #SAMPLE LINE:
            #A,1,12884902522,B,B,4900,AAIR,0.1046,28800,390,B,AARCA,
            #After applying this bit of code, the line should be split into this:
            #['A', '1', '12884902522', 'B', 'B', '4900', 'AAIR', '0.1046', '28800', '390', 'B', 'AARCA']
            #NOW, you can make comparisons against those terms. :)

            terms = [term for term in row.split(',') if term.strip() != '']
            current_timestamp = int(terms[2])

            #compare the current against the previous
            #starting from row 2: (index 1)
            if n_rows > 1:
                #Python uses circular indices, hence: -1 means the value at the last index
                #That is, the previous time_stamp. Now perform the comparison and do something if that criterion is met:
                if current_timestamp - all_timestamp[-1] >= 0:
                    pass #the pass keyword means to do nothing. You'll have to replace it with whatever code you want

            #increment n_rows every time:
            n_rows += 1

            #always append the current timestamp to all the time_stamps
            all_timestamps.append(current_timestamp)


            if (terms[6] == ticker):
                # add something to make sure chronological order hasn't been broken
                if (allTypes == 1):
                    ofile.write(row)
            #I don't know if this was a bad indent of not, but you should know
            #where this goes
            elif (terms[0] == "A" or terms[0] == "M" or terms[0] == "D"):
                print row
                ofile.write(row)

我最初的推測是正確的。 您沒有將行拆分為CSV組件。 因此,當您在行上進行比較時,您沒有得到正確的結果-因此,您沒有得到任何輸出。 現在應該可以使用了(根據您的目標進行了一些修改)。 :)

只是要添加到jrd1的答案中。 我很少使用csv模塊,我只對字符串使用split和join方法。 通常我會得到這樣的結果(如果只有一個輸入和輸出,我通常只使用stdin和stdout)。

import sys as sys

for row in sys.stdin:
  fields = row.split(",") #Could be "\t" or whatever, default is whitespace

  #process fields in someway (0 based indexing)
  fields[0] = str(int(fields[0]) + 55) 
  fields[7] = new_date_format(fields[7])
  if(some_condition_is_met):
    print(",".join(fields))

當然,如果您的csv文件開始出現一些帶有引號和內部逗號等的時髦條目,那么這種方法就不會那么有趣了。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM