I am trying to import a .csv file using pandas in python. I am using pandas.read_csv to do that. But I have a requirement to check each row in the dataframe and take values of two specific columns into an array. As my dataframe has almost 3milion(~1gb) rows doing it iteratively after the import is taking time. Can I do that while importing the file itself? Is it a good idea to modify read_csv library function to accommodate this?
df = pd.read_csv("file.csv")
def get():
for a in list_A: #This list is of size ~2300
for b in list_B: #This list is of size ~12000
if a row exists such that it has a,b:
//do something
Due to very large size of lists, this function is running slow. Also, querying a dataframe of such big size is also slowing down the execution. Any suggestions/solutions to improve the performance.
Python's default csv module reads the file line by line, instead of loading it fully into memory.
Code would look something like this:
import csv
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
if row[1] in list_A and row[3] in list_B:
# do something with the row
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.