[英]Python - Using csv and xlrd module to write multi-row excel file to one row csv file
[英]write csv row (from within for loop) out to csv file without using python csv module
**我的目標是避免導入csv模塊
我正在研究一個腳本,該腳本通過一個非常大的csv文件運行,並有選擇地將行寫入新的csv文件。
我有以下兩行:
with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
for row in ifile:
然后,這是一些嵌套的if語句:
line = list(ifile)[row]
ofile.write(line)
我知道這是不對的-我對此a之以鼻,希望這里的人能為如何正確解決此問題提供一些啟示。 這個問題的實質是如何引用我所在的行,以便可以使用“ ofile”將其寫到新的csv文件中。 如果需要進一步說明,請告訴我。 謝謝!
編輯:pastebin鏈接中包含完整代碼-http: //pastebin.com/a0jx85xR
你很親密 這就是您要做的一切:
with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
for row in ifile:
#...
#You've defined some_condition to be met (you will have to replace this for yourself)
#E.g.: the number of entries in each row is greater than 5:
if len([term for term in row.split('#') if term.strip() != '']) > 5:
ofile.write(row)
更新:
要回答OP關於分割線的問題:
您可以通過提供定界字符在Python中拆分一行。 由於這是一個CSV文件,分割得到的線,
。 例:
如果這是一行(字符串):
0, 1, 2, 3, 4, 5
如果您申請:
line.split(',')
您將獲得一個列表 :
['0', '1', '2', '3', '4', '5']
更新2:
import sys
if __name__ == '__main__':
ticker = sys.argv[3]
allTypes = bool(int(sys.argv[4])) #argv[4] is a string, you have to convert it to an int, then to a bool
with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
all_timestamps = [] #this is an empty list
n_rows = 0
for row in ifile:
#This splits the line into constituent terms as described earlier
#SAMPLE LINE:
#A,1,12884902522,B,B,4900,AAIR,0.1046,28800,390,B,AARCA,
#After applying this bit of code, the line should be split into this:
#['A', '1', '12884902522', 'B', 'B', '4900', 'AAIR', '0.1046', '28800', '390', 'B', 'AARCA']
#NOW, you can make comparisons against those terms. :)
terms = [term for term in row.split(',') if term.strip() != '']
current_timestamp = int(terms[2])
#compare the current against the previous
#starting from row 2: (index 1)
if n_rows > 1:
#Python uses circular indices, hence: -1 means the value at the last index
#That is, the previous time_stamp. Now perform the comparison and do something if that criterion is met:
if current_timestamp - all_timestamp[-1] >= 0:
pass #the pass keyword means to do nothing. You'll have to replace it with whatever code you want
#increment n_rows every time:
n_rows += 1
#always append the current timestamp to all the time_stamps
all_timestamps.append(current_timestamp)
if (terms[6] == ticker):
# add something to make sure chronological order hasn't been broken
if (allTypes == 1):
ofile.write(row)
#I don't know if this was a bad indent of not, but you should know
#where this goes
elif (terms[0] == "A" or terms[0] == "M" or terms[0] == "D"):
print row
ofile.write(row)
我最初的推測是正確的。 您沒有將行拆分為CSV組件。 因此,當您在行上進行比較時,您沒有得到正確的結果-因此,您沒有得到任何輸出。 現在應該可以使用了(根據您的目標進行了一些修改)。 :)
只是要添加到jrd1的答案中。 我很少使用csv模塊,我只對字符串使用split和join方法。 通常我會得到這樣的結果(如果只有一個輸入和輸出,我通常只使用stdin和stdout)。
import sys as sys
for row in sys.stdin:
fields = row.split(",") #Could be "\t" or whatever, default is whitespace
#process fields in someway (0 based indexing)
fields[0] = str(int(fields[0]) + 55)
fields[7] = new_date_format(fields[7])
if(some_condition_is_met):
print(",".join(fields))
當然,如果您的csv文件開始出現一些帶有引號和內部逗號等的時髦條目,那么這種方法就不會那么有趣了。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.