簡體   English   中英

與Python的存在如果有其他的數據解析CSV文件

[英]Parsing CSV files with Python IF specific data exists

數據文件如下所示:

"2015","21","2","RICK","D","w","1","1","f","8","","00","","","","","S"
"2015","56","5","RICK","E","g","1","1","k","8","","15","","","","","F"

如果最后一個字段是“ S”,則僅需要將第三個字段添加到總計中。 否則,該行將被跳過。

我嘗試導入CSV並使用以下內容:

for line in csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True):
if line[16] == "S":
    total = total + line[2]

這告訴我“ IndexError:列表索引超出范圍”。 也許有更好的方法。 我以為Import CSV將為我完成大部分工作。 最好的方法是什么? 在這一點上,我將采取任何可行的方法。

打印一行顯示以下內容:

['"2015"', '"43"', '"2"', '"ZETA"', '"W"', '"x"', '"1"', '"1"', '"d"', '"2"', '""', '"31"', '""', '""', '""', '""', '"N"']

pandas可以輕松做到這一點:

In [52]:
# read the csv into a dataframe
df = pd.read_csv(r'c:\data\sample.txt', quotechar="\"", header=None)
df
Out[52]:
     0   1   2     3  4  5   6   7  8   9   10  11  12  13  14  15 16
0  2015  21   2  RICK  D  w   1   1  f   8 NaN   0 NaN NaN NaN NaN  S
1  2015  56   5  RICK  E  g   1   1  k   8 NaN  15 NaN NaN NaN NaN  F
In [55]:
# we can filter the values and then call count()
df.loc[df[16] == 'S',16].count()
Out[55]:
1
In [56]:
# we can also show the count for all unique values
df[16].value_counts()
Out[56]:
S    1
F    1
dtype: int64

=值從右側操作數分配給左側操作數

if line[16] = "S": if line[16] == "S":

hzhang@dell-work ~ $ cat sample.csv 
"2015","21","2","RICK","D","w","1","1","f","8","","00","","","","","S"
"2015","56","5","RICK","E","g","1","1","k","8","","15","","","","","F"
hzhang@dell-work ~ $ cat test.py 
import csv
with open("sample.csv", "rb") as csvfile:
    csvreader = csv.reader(csvfile, delimiter=",")
    total = 0
    for line in csvreader:
        if line[16] =="S":
            total = total + int(line[2])

    print "total is:{}".format(total)
hzhang@dell-work ~ $ python test.py 
total is:2

根據您的代碼:

import csv
file = open("sample.csv")
total = 0
for line in csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True):
    if line[16] == "S":
        total = total + int(line[2])

file.close()
print "total:{}".format(total)
hzhang@dell-work ~ $ python test.py 
total:2

請確保所有輸入行都有17個字段,並在匯總它們之前轉換每個字段的第3列。

檢查哪些行沒有17個字段。 如果len(line)!= 17:打印行

該文件可能不一致地包含17列。 要做到這一點是,如果沒有在文件的結尾額外的換行符的一種方式。

這是檢測哪條線引起問題的方法。

reader = csv.reader(file, quotechar='"', delimiter=',', quoting=csv.QUOTE_ALL, skipinitialspace=True)
for line_num, line in enumerate(reader, start=1):
    try:
        if line[16] == "S":
            total = total + line[2]
    except IndexError:
        # show offending line
        print(line_num, line)
        # reraise to halt execution
        raise

您可能考慮使用負數組索引從數組末尾訪問項目:

total = 0
for line in cvs.reader(...):
    if line[-1] == "S":
        total += int(line[2])

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM