繁体 English 中英

在 Python 中读取 Parquet 文件的内存占用最少的方法是什么？可以逐行吗？

[英]What is the least memory-intensive way to read a Parquet file in Python? Is line-by-line possible?

原文 2022-08-04 21:46:04 1 1 python/ parquet/ pyarrow/ fastparquet

我正在编写 lambda 来读取存储在 Parquet 文件中的记录，将它们重组为partition_key: {json_record}格式，并将记录提交到 Kafka 队列。 我想知道是否有任何方法可以在不一次将整个表格读入 memory 的情况下做到这一点。

我尝试使用fastparquet库中的iter_row_groups方法，但我的记录只有一个行组，所以我仍在将整个表加载到 memory 中。 我注意到BufferReader的pyarrow有一个readlines方法，但它没有实现。 Parquet的真正逐行阅读是不可能的吗？

可能值得指出的是，我正在使用存储在 S3 中的 Parquet 文件，因此理想情况下，解决方案能够在StreamingBody中读取

1 个解决方案

我建议你可以看看 DuckDB 和 polars：

DuckDB https://duckdb.org/2021/06/25/querying-parquet.html

当然可以将查询限制为前 1000 个结果。 如果你有一些行索引用duckdb 和 SELECT 遍历整个镶木地板，那么 WHERE 应该很容易。

polars https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.scan_parquet.ZFC35FDC70D5FC69D269883A822C7A53E

您可以尝试使用 row_count_name 和 row_count_offset。 同样，使用现有的行索引列将行作为块读取是可行的。

逐行读取文件还是存储在内存中？

[英]Read file line-by-line or store in memory?

逐行读取CSV文件python

[英]Read CSV file line-by-line python

逐行读取文件，而不是将文件读取到内存

[英]Read a file line-by-line instead of read file into memory

是否可以逐行读取文件，同时也跳过给定行数的Python

[英]Is it possible to read a file line-by-line in while also skipping a given number of lines Python

我应该如何在 Python 中逐行读取文件？

[英]How should I read a file line-by-line in Python?

使用 python 将文本文件逐行读入字符串

[英]Read a text file line-by-line into a string with python

Python逐行内存分析器？

[英]Python line-by-line memory profiler?

如何将文件逐行读入列表？

[英]How to read a file line-by-line into a list?

在Python中清除内存密集型过程之间的内存

[英]Clearing memory between memory-intensive procedures in Python

在python中，什么是逐行读取标准的功能和内存有效方式？

[英]In python, what is a functional, and memory efficient way to read standard in, line by line?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 逐行读取文件还是存储在内存中？逐行读取CSV文件python 逐行读取文件，而不是将文件读取到内存是否可以逐行读取文件，同时也跳过给定行数的Python 我应该如何在 Python 中逐行读取文件？使用 python 将文本文件逐行读入字符串 Python逐行内存分析器？如何将文件逐行读入列表？在Python中清除内存密集型过程之间的内存在python中，什么是逐行读取标准的功能和内存有效方式？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM