I have a big txt file (52.375 kb, ln 86213, col 420).
name | code | school
--------|-------|--------
steven | 1234 | harvard
Michael | 98765 | MIT
I want to read it and make it as a data frame in pandas.
Df = statement_read(myfile.Txt)
I don't want to convert txt to csv manually. I want to read myfile.txt by python, so then I can process it by pandas.
If you meant how to process big files with pandas, then you need to use pandas chunk, for example for 10 gigabytes file, you can choose chunk size of 100 mb as an example, please note the chunk size is the number of rows you decided to read in each chunk.
import pandas as pd
for chunk in pd.read_csv('file.csv',chunksize=3):
print(chunk[['name','code']])
let say you have sample file with billions of records
name,code,school
student1,c1,sch22
student2,c2,sch22
student3,c3,sch22
student4,c4,sch22
student5,c5,sch22
student6,c6,sch23
. . .
. . .
the above code shall fetch 3 rows in each patch as below
name code school
0 student1 c1 sch22
1 student2 c2 sch22
2 student3 c3 sch22
name code school
3 student4 c4 sch22
4 student5 c5 sch22
5 student6 c6 sch23
name code school
6 student7 c7 sch24
7 student8 c8 sch25
8 student9 c9 sch26
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.