简体   繁体   English

Python - 读取没有熊猫的镶木地板文件

[英]Python - read parquet file without pandas

Currently I'm using the code below on Python 3.5, Windows to read in a parquet file.目前我在Python 3.5, Windows上使用下面的代码来读取parquet文件。

import pandas as pd

parquetfilename = 'File1.parquet'
parquetFile = pd.read_parquet(parquetfilename, columns=['column1', 'column2'])  

However, I'd like to do so without using pandas.但是,我想在不使用熊猫的情况下这样做。 How to best do this?如何最好地做到这一点? I'm using both Python 2.7 and 3.6 on Windows .我在Windows上同时使用Python 2.7 and 3.6

You can use duckdb for this.您可以为此使用duckdb It's an embedded RDBMS similar to SQLite but with OLAP in mind.它是一个类似于 SQLite 但考虑到 OLAP 的嵌入式 RDBMS。 There's a nice Python API and a SQL function to import Parquet files:有一个很好的 Python API 和一个 SQL 函数来导入 Parquet 文件:

import duckdb

conn = duckdb.connect(":memory:") # or a file name to persist the DB

# Keep in mind this doesn't support partitioned datasets,
# so you can only read one partition at a time
conn.execute("CREATE TABLE mydata AS SELECT * FROM parquet_scan('/path/to/mydata.parquet')")

# Export a query as CSV
conn.execute("COPY (SELECT * FROM mydata WHERE col = 'val') TO 'col_val.csv' WITH (HEADER 1, DELIMITER ',')")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM