简体   繁体   中英

How to read big file txt, and then make data frame

I have a big txt file (52.375 kb, ln 86213, col 420).

name    | code  | school 
--------|-------|--------
steven  | 1234  | harvard
Michael | 98765 | MIT

I want to read it and make it as a data frame in pandas.

Df = statement_read(myfile.Txt)

I don't want to convert txt to csv manually. I want to read myfile.txt by python, so then I can process it by pandas.

If you meant how to process big files with pandas, then you need to use pandas chunk, for example for 10 gigabytes file, you can choose chunk size of 100 mb as an example, please note the chunk size is the number of rows you decided to read in each chunk.

import pandas as pd
for chunk in pd.read_csv('file.csv',chunksize=3):
    print(chunk[['name','code']])

update

let say you have sample file with billions of records

name,code,school

student1,c1,sch22
student2,c2,sch22
student3,c3,sch22
student4,c4,sch22
student5,c5,sch22
student6,c6,sch23
  .       .   .
  .       .   .

the above code shall fetch 3 rows in each patch as below

       name code school
0  student1   c1  sch22
1  student2   c2  sch22
2  student3   c3  sch22
       name code school
3  student4   c4  sch22
4  student5   c5  sch22
5  student6   c6  sch23
       name code school
6  student7   c7  sch24
7  student8   c8  sch25
8  student9   c9  sch26

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM