简体   繁体   中英

How to read a CSV file subset by subset with Pandas?

I have a data frame with 13000 rows and 3 columns:

('time', 'rowScore', 'label')

I want to read subset by subset:

[[1..360], [360..712], ..., [12640..13000]]

I used list too but it's not working:

import pandas as pd
import math
import datetime

result="data.csv"
dataSet = pd.read_csv(result)
TP=0
count=0
x=0
df = pd.DataFrame(dataSet, columns = 
     ['rawScore','label'])
for i,row in df.iterrows():
    data=  row.to_dict()   

    ScoreX= data['rawScore']
    labelX=data['label']


  for i in range (1,13000,360):
     x=x+1
    for j in range (i,360*x,1):
        if ((ScoreX  > 0.3) and (labelX ==0)):
            count=count+1
 print("count=",count)

You can also use the parameters nrows or skiprows to break it up into chunks. I would recommend against using iterrows since that is typically very slow. If you do this when reading in the values, and saving these chunks separately, then it would skip the iterrows section. This is for the file reading if you want to split up into chunks (which seems to be an intermediate step in what you're trying to do).

Another way is to subset using generators by seeing if the values belong to each set: [[1..360], [360..712], ..., [12640..13000]]

So write a function that takes the chunks with indices divisible by 360 and if the indices are in that range, then choose that particular subset.

I just wrote these approaches down as alternative ideas you might want to play around with, since in some cases you may only want a subset and not all of the chunks for calculation purposes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM