简体   繁体   English

如何读取大文件txt,然后制作数据框

[英]How to read big file txt, and then make data frame

I have a big txt file (52.375 kb, ln 86213, col 420). 我有一个很大的txt文件(52.375 kb,ln 86213,col 420)。

name    | code  | school 
--------|-------|--------
steven  | 1234  | harvard
Michael | 98765 | MIT

I want to read it and make it as a data frame in pandas. 我想阅读它并将其作为熊猫的数据框。

Df = statement_read(myfile.Txt)

I don't want to convert txt to csv manually. 我不想手动将txt转换为csv。 I want to read myfile.txt by python, so then I can process it by pandas. 我想通过python读取myfile.txt,因此可以通过pandas处理它。

If you meant how to process big files with pandas, then you need to use pandas chunk, for example for 10 gigabytes file, you can choose chunk size of 100 mb as an example, please note the chunk size is the number of rows you decided to read in each chunk. 如果要使用pandas处理大文件,则需要使用pandas块,例如10 GB的文件,可以选择100 mb的块大小作为示例,请注意,块大小是您决定的行数读取每个块。

import pandas as pd
for chunk in pd.read_csv('file.csv',chunksize=3):
    print(chunk[['name','code']])

update 更新

let say you have sample file with billions of records 假设您有数十亿条记录的样本文件

name,code,school

student1,c1,sch22
student2,c2,sch22
student3,c3,sch22
student4,c4,sch22
student5,c5,sch22
student6,c6,sch23
  .       .   .
  .       .   .

the above code shall fetch 3 rows in each patch as below 上面的代码应在每个补丁中获取3行,如下所示

       name code school
0  student1   c1  sch22
1  student2   c2  sch22
2  student3   c3  sch22
       name code school
3  student4   c4  sch22
4  student5   c5  sch22
5  student6   c6  sch23
       name code school
6  student7   c7  sch24
7  student8   c8  sch25
8  student9   c9  sch26

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM