简体   繁体   English

Zeppelin:用SQL查询数据并使用它的最佳方法是什么?

[英]Zeppelin: What the best way to query data with SQL and work with it?

I want to use Zeppelin to query databases. 我想使用Zeppelin查询数据库。 I currently see two possibilities but none of them is sufficient for me: 我目前看到两种可能性,但它们对我来说都不足够:

  1. Configure a database connection as "interpreter", name it eg "sql1", use it in a paragraph, run a sql query and use the inbuilt nice plotting tools. 将数据库连接配置为“解释器”,将其命名为“ sql1”,在段落中使用它,运行sql查询并使用内置的漂亮绘图工具。 It seems that all the tutorials and tips deal with it but then the documentation suddenly stops! 似乎所有的教程和技巧都处理了它,但是文档突然停止了! But I want to do more with the data: I want to filter and process. 但是我想对数据做更多的事情:我想过滤和处理。 If I want to plot it again (with other limitations), I have to do the query (that may last some seconds or minutes) again (see my other question Zeppelin SQL: reuse data of query without another interpreter or a new query ) 如果我想再次绘制它(有其他限制),我必须再次进行查询(可能持续几秒钟或几分钟)(请参阅我的另一个问题Zeppelin SQL:在没有其他解释器或新查询的情况下重用查询数据
  2. Use spark with python, scala or similar. 在python,scala或类似版本中使用spark。 But the documentation seems only to load csv data, put in into a dataframe and then accesses this dataframe with sql. 但是文档似乎只加载csv数据,放入数据框,然后使用sql访问此数据框。 There is no accessing the data with sql in the first place. 首先,无法使用sql访问数据。 How do I access the sql data the best way? 如何以最佳方式访问sql数据? Can I use a already configured "interpreter" (database connection)? 我可以使用已经配置的“解释器”(数据库连接)吗?

You can use Zeppelin API to retrieve paragraph data: 您可以使用Zeppelin API检索段落数据:

val buffer = scala.io.Source.fromURL("http://XXXXX:9995/api/notebook/2CN2QP93H/paragraph/20170713-092810_1633770798").mkString

val df = sqlContext.read.json(sc.parallelize(buffer :: Nil)).select("body.text")

df.first.getAs[String](0)

This Spark Scala lines will retrieve the SQL query used by a paragprah. 此Spark Scala行将检索paragprah使用的SQL查询。 You could do same thing to get results I think. 我认为您可以做同样的事情来获得结果。

I cannot find a solution for 1. But I have made a short solution for 2. that works within zeppelin with python (2.7), sqlalchemy (sql wrapper), mysqldb (mysql implementation) and pandas (make sure that have these packages installed, all of them are in Debian 9). 我找不到1的解决方案。但是我为2做了一个简短的解决方案,该解决方案可在zeppelin中与python(2.7),sqlalchemy(sql包装器),mysqldb(mysql实现)和pandas(确保已安装这些软件包,所有这些都在Debian 9中)。 I wonder why I have not found such a solution before... 我想知道为什么以前没有找到这样的解决方案...

%python
from sqlalchemy import create_engine
import pandas as pd

sql = "select col1, col2 from table limit 10"
df = pd.read_sql(sql,
create_engine('mysql+mysqldb://user:password@host:3306/database').connect())

z.show(df)

If you want to connect to another database like db2 or oracle, you have to use other python packages and adjust the first part in the create_engine string. 如果要连接到另一个数据库,例如db2或oracle,则必须使用其他python软件包并调整create_engine字符串中的第一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在每个 html 页面中执行相同 sql 查询的最佳方法是什么 - what is the best way for doing same sql query in every html pages 提取此数据的最佳方法是什么 - What is the best way to extract this data 在 tkinter canvas 中处理图像的最佳方法是什么? - What is the best way to work with images in the tkinter canvas? Django什么是构建查询的最佳方法 - Django what is best way to build query 将数据从 pandas 数据帧导入 SQL 服务器时忽略错误的最佳方法是什么? - What is the best way to ignore errors when importing data from a pandas data frame to SQL Server? 将所有数据从 Oracle 11.2 迁移到 SQL Server 2012 的最佳方法是什么? - What is the best way to migrate all data from Oracle 11.2 to SQL Server 2012? 使用 ZA7F517F354216B63827 将 API 生成的嵌套 Json 数据存储到 SQL DB 中的最佳方法是什么 - What is the best way to store a nested Json data generated from API into SQL DB using Python 从pdf中提取数据的最佳方法是什么 - what is the best way to extract data from pdf 将数据传递到pycharm的最佳方法是什么? - What is the best way to pass data into pycharm? 解析此数据的最佳Pythonic方法是什么? - What is the best Pythonic way to parse this data?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM