[英]How to read big file txt, and then make data frame
I have a big txt file (52.375 kb, ln 86213, col 420). 我有一个很大的txt文件(52.375 kb,ln 86213,col 420)。
name | code | school
--------|-------|--------
steven | 1234 | harvard
Michael | 98765 | MIT
I want to read it and make it as a data frame in pandas. 我想阅读它并将其作为熊猫的数据框。
Df = statement_read(myfile.Txt)
I don't want to convert txt to csv manually. 我不想手动将txt转换为csv。 I want to read myfile.txt by python, so then I can process it by pandas.
我想通过python读取myfile.txt,因此可以通过pandas处理它。
If you meant how to process big files with pandas, then you need to use pandas chunk, for example for 10 gigabytes file, you can choose chunk size of 100 mb as an example, please note the chunk size is the number of rows you decided to read in each chunk. 如果要使用pandas处理大文件,则需要使用pandas块,例如10 GB的文件,可以选择100 mb的块大小作为示例,请注意,块大小是您决定的行数读取每个块。
import pandas as pd
for chunk in pd.read_csv('file.csv',chunksize=3):
print(chunk[['name','code']])
let say you have sample file with billions of records 假设您有数十亿条记录的样本文件
name,code,school
student1,c1,sch22
student2,c2,sch22
student3,c3,sch22
student4,c4,sch22
student5,c5,sch22
student6,c6,sch23
. . .
. . .
the above code shall fetch 3 rows in each patch as below 上面的代码应在每个补丁中获取3行,如下所示
name code school
0 student1 c1 sch22
1 student2 c2 sch22
2 student3 c3 sch22
name code school
3 student4 c4 sch22
4 student5 c5 sch22
5 student6 c6 sch23
name code school
6 student7 c7 sch24
7 student8 c8 sch25
8 student9 c9 sch26
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.