[英]Python - read parquet file without pandas
Currently I'm using the code below on Python 3.5, Windows
to read in a parquet
file.目前我在
Python 3.5, Windows
上使用下面的代码来读取parquet
文件。
import pandas as pd
parquetfilename = 'File1.parquet'
parquetFile = pd.read_parquet(parquetfilename, columns=['column1', 'column2'])
However, I'd like to do so without using pandas.但是,我想在不使用熊猫的情况下这样做。 How to best do this?
如何最好地做到这一点? I'm using both
Python 2.7 and 3.6
on Windows
.我在
Windows
上同时使用Python 2.7 and 3.6
。
You can use duckdb
for this.您可以为此使用
duckdb
。 It's an embedded RDBMS similar to SQLite but with OLAP in mind.它是一个类似于 SQLite 但考虑到 OLAP 的嵌入式 RDBMS。 There's a nice Python API and a SQL function to import Parquet files:
有一个很好的 Python API 和一个 SQL 函数来导入 Parquet 文件:
import duckdb
conn = duckdb.connect(":memory:") # or a file name to persist the DB
# Keep in mind this doesn't support partitioned datasets,
# so you can only read one partition at a time
conn.execute("CREATE TABLE mydata AS SELECT * FROM parquet_scan('/path/to/mydata.parquet')")
# Export a query as CSV
conn.execute("COPY (SELECT * FROM mydata WHERE col = 'val') TO 'col_val.csv' WITH (HEADER 1, DELIMITER ',')")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.