[英]Parsing CSV files with Python IF specific data exists
數據文件如下所示:
"2015","21","2","RICK","D","w","1","1","f","8","","00","","","","","S"
"2015","56","5","RICK","E","g","1","1","k","8","","15","","","","","F"
如果最后一個字段是“ S”,則僅需要將第三個字段添加到總計中。 否則,該行將被跳過。
我嘗試導入CSV並使用以下內容:
for line in csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True):
if line[16] == "S":
total = total + line[2]
這告訴我“ IndexError:列表索引超出范圍”。 也許有更好的方法。 我以為Import CSV將為我完成大部分工作。 最好的方法是什么? 在這一點上,我將采取任何可行的方法。
打印一行顯示以下內容:
['"2015"', '"43"', '"2"', '"ZETA"', '"W"', '"x"', '"1"', '"1"', '"d"', '"2"', '""', '"31"', '""', '""', '""', '""', '"N"']
pandas
可以輕松做到這一點:
In [52]:
# read the csv into a dataframe
df = pd.read_csv(r'c:\data\sample.txt', quotechar="\"", header=None)
df
Out[52]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 2015 21 2 RICK D w 1 1 f 8 NaN 0 NaN NaN NaN NaN S
1 2015 56 5 RICK E g 1 1 k 8 NaN 15 NaN NaN NaN NaN F
In [55]:
# we can filter the values and then call count()
df.loc[df[16] == 'S',16].count()
Out[55]:
1
In [56]:
# we can also show the count for all unique values
df[16].value_counts()
Out[56]:
S 1
F 1
dtype: int64
=
值從右側操作數分配給左側操作數
if line[16] = "S":
if line[16] == "S":
hzhang@dell-work ~ $ cat sample.csv
"2015","21","2","RICK","D","w","1","1","f","8","","00","","","","","S"
"2015","56","5","RICK","E","g","1","1","k","8","","15","","","","","F"
hzhang@dell-work ~ $ cat test.py
import csv
with open("sample.csv", "rb") as csvfile:
csvreader = csv.reader(csvfile, delimiter=",")
total = 0
for line in csvreader:
if line[16] =="S":
total = total + int(line[2])
print "total is:{}".format(total)
hzhang@dell-work ~ $ python test.py
total is:2
根據您的代碼:
import csv
file = open("sample.csv")
total = 0
for line in csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True):
if line[16] == "S":
total = total + int(line[2])
file.close()
print "total:{}".format(total)
hzhang@dell-work ~ $ python test.py
total:2
請確保所有輸入行都有17個字段,並在匯總它們之前轉換每個字段的第3列。
檢查哪些行沒有17個字段。 如果len(line)!= 17:打印行
該文件可能不一致地包含17列。 要做到這一點是,如果沒有在文件的結尾額外的換行符的一種方式。
這是檢測哪條線引起問題的方法。
reader = csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True)
for line_num, line in enumerate(reader, start=1):
try:
if line[16] == "S":
total = total + line[2]
except IndexError:
# show offending line
print(line_num, line)
# reraise to halt execution
raise
您可能考慮使用負數組索引從數組末尾訪問項目:
total = 0
for line in cvs.reader(...):
if line[-1] == "S":
total += int(line[2])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.