简体   繁体   中英

Reading last few rows using read_csv in pandas

I have a file which is continuously growing like this:

https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|158|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|246|POST|74.125.200.95
https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|140|POST|203.101.110.171
https|webmail.mahindracomviva.com|application/x-protobuf|52|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|502|POST|74.125.200.95
https|www.googleapis.com|application/x-protobuf|40|POST|74.125.200.95

But I would like to read only the last 50 lines using Pandas.

You have to follow this steps:

  1. First find the length of CSV file without loading the whole CSV files into the ram. You have to use chunksize in read_csv().

     import pandas as pd count = 0 for data in pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',chunksize = 1000): count += 1 # counting the number of chunks lastlen = len(data) # finding the length of last chunk datalength = (count*1000 + lastlen - 1000) # length of total file 
  2. Second minus the no of rows which you want to read.

     rowsdiff = datalen - 300 df = pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',skiprows = range(1,difrows), nrows = 299) 

By this method you have to read only last few lines without laoding the whole CSV file into the ram

Try using the pandas tail(), line so:

filename = "your_file"
last_rows = 3
data = pd.read_csv(filename, header=None, sep = "|")
print(data.tail(last_rows))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM