不使用python csv模塊將csv行（從for循環內）寫出到csv文件

Question

**我的目標是避免導入csv模塊

我正在研究一個腳本，該腳本通過一個非常大的csv文件運行，並有選擇地將行寫入新的csv文件。

我有以下兩行：

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile:

然后，這是一些嵌套的if語句：

line = list(ifile)[row]
ofile.write(line)

我知道這是不對的-我對此a之以鼻，希望這里的人能為如何正確解決此問題提供一些啟示。 這個問題的實質是如何引用我所在的行，以便可以使用“ ofile”將其寫到新的csv文件中。 如果需要進一步說明，請告訴我。 謝謝！

編輯：pastebin鏈接中包含完整代碼-http: //pastebin.com/a0jx85xR

Answer 1

你很親密 這就是您要做的一切：

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile:

    #...
    #You've defined some_condition to be met (you will have to replace this for yourself)
    #E.g.: the number of entries in each row is greater than 5:
        if len([term for term in row.split('#') if term.strip() != '']) > 5:
            ofile.write(row)

更新：

要回答OP關於分割線的問題：

您可以通過提供定界字符在Python中拆分一行。 由於這是一個CSV文件，分割得到的線, 。 例：

如果這是一行（字符串）：

0, 1, 2, 3, 4, 5

如果您申請：

line.split(',')

您將獲得一個列表 ：

['0', '1', '2', '3', '4', '5']

更新2：

import sys

if __name__ == '__main__':
    ticker = sys.argv[3]
    allTypes = bool(int(sys.argv[4])) #argv[4] is a string, you have to convert it to an int, then to a bool

    with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
        all_timestamps = [] #this is an empty list
        n_rows = 0
        for row in ifile:
            #This splits the line into constituent terms as described earlier
            #SAMPLE LINE:
            #A,1,12884902522,B,B,4900,AAIR,0.1046,28800,390,B,AARCA,
            #After applying this bit of code, the line should be split into this:
            #['A', '1', '12884902522', 'B', 'B', '4900', 'AAIR', '0.1046', '28800', '390', 'B', 'AARCA']
            #NOW, you can make comparisons against those terms. :)

            terms = [term for term in row.split(',') if term.strip() != '']
            current_timestamp = int(terms[2])

            #compare the current against the previous
            #starting from row 2: (index 1)
            if n_rows > 1:
                #Python uses circular indices, hence: -1 means the value at the last index
                #That is, the previous time_stamp. Now perform the comparison and do something if that criterion is met:
                if current_timestamp - all_timestamp[-1] >= 0:
                    pass #the pass keyword means to do nothing. You'll have to replace it with whatever code you want

            #increment n_rows every time:
            n_rows += 1

            #always append the current timestamp to all the time_stamps
            all_timestamps.append(current_timestamp)


            if (terms[6] == ticker):
                # add something to make sure chronological order hasn't been broken
                if (allTypes == 1):
                    ofile.write(row)
            #I don't know if this was a bad indent of not, but you should know
            #where this goes
            elif (terms[0] == "A" or terms[0] == "M" or terms[0] == "D"):
                print row
                ofile.write(row)

我最初的推測是正確的。 您沒有將行拆分為CSV組件。 因此，當您在行上進行比較時，您沒有得到正確的結果-因此，您沒有得到任何輸出。 現在應該可以使用了（根據您的目標進行了一些修改）。 :)

Answer 2

只是要添加到jrd1的答案中。 我很少使用csv模塊，我只對字符串使用split和join方法。 通常我會得到這樣的結果（如果只有一個輸入和輸出，我通常只使用stdin和stdout）。

import sys as sys

for row in sys.stdin:
  fields = row.split(",") #Could be "\t" or whatever, default is whitespace

  #process fields in someway (0 based indexing)
  fields[0] = str(int(fields[0]) + 55) 
  fields[7] = new_date_format(fields[7])
  if(some_condition_is_met):
    print(",".join(fields))

當然，如果您的csv文件開始出現一些帶有引號和內部逗號等的時髦條目，那么這種方法就不會那么有趣了。

不使用python csv模塊將csv行（從for循環內）寫出到csv文件

問題描述

2 個解決方案

解決方案1
0 已采納 2013-10-28 02:24:29

解決方案2
0 2013-10-28 03:41:15

不使用python csv模塊將csv行（從for循環內）寫出到csv文件

問題描述

2 個解決方案

解決方案1 0 已采納 2013-10-28 02:24:29

解決方案2 0 2013-10-28 03:41:15

解決方案1
0 已采納 2013-10-28 02:24:29

解決方案2
0 2013-10-28 03:41:15