Zeppelin：用SQL查询数据并使用它的最佳方法是什么？

Question

I want to use Zeppelin to query databases. 我想使用Zeppelin查询数据库。 I currently see two possibilities but none of them is sufficient for me: 我目前看到两种可能性，但它们对我来说都不足够：

Configure a database connection as "interpreter", name it eg "sql1", use it in a paragraph, run a sql query and use the inbuilt nice plotting tools. 将数据库连接配置为“解释器”，将其命名为“ sql1”，在段落中使用它，运行sql查询并使用内置的漂亮绘图工具。 It seems that all the tutorials and tips deal with it but then the documentation suddenly stops! 似乎所有的教程和技巧都处理了它，但是文档突然停止了！ But I want to do more with the data: I want to filter and process. 但是我想对数据做更多的事情：我想过滤和处理。 If I want to plot it again (with other limitations), I have to do the query (that may last some seconds or minutes) again (see my other question Zeppelin SQL: reuse data of query without another interpreter or a new query ) 如果我想再次绘制它（有其他限制），我必须再次进行查询（可能持续几秒钟或几分钟）（请参阅我的另一个问题Zeppelin SQL：在没有其他解释器或新查询的情况下重用查询数据）
Use spark with python, scala or similar. 在python，scala或类似版本中使用spark。 But the documentation seems only to load csv data, put in into a dataframe and then accesses this dataframe with sql. 但是文档似乎只加载csv数据，放入数据框，然后使用sql访问此数据框。 There is no accessing the data with sql in the first place. 首先，无法使用sql访问数据。 How do I access the sql data the best way? 如何以最佳方式访问sql数据？ Can I use a already configured "interpreter" (database connection)? 我可以使用已经配置的“解释器”（数据库连接）吗？

Answer 1

You can use Zeppelin API to retrieve paragraph data: 您可以使用Zeppelin API检索段落数据：

val buffer = scala.io.Source.fromURL("http://XXXXX:9995/api/notebook/2CN2QP93H/paragraph/20170713-092810_1633770798").mkString

val df = sqlContext.read.json(sc.parallelize(buffer :: Nil)).select("body.text")

df.first.getAs[String](0)

This Spark Scala lines will retrieve the SQL query used by a paragprah. 此Spark Scala行将检索paragprah使用的SQL查询。 You could do same thing to get results I think. 我认为您可以做同样的事情来获得结果。

Answer 2

I cannot find a solution for 1. But I have made a short solution for 2. that works within zeppelin with python (2.7), sqlalchemy (sql wrapper), mysqldb (mysql implementation) and pandas (make sure that have these packages installed, all of them are in Debian 9). 我找不到1的解决方案。但是我为2做了一个简短的解决方案，该解决方案可在zeppelin中与python（2.7），sqlalchemy（sql包装器），mysqldb（mysql实现）和pandas（确保已安装这些软件包，所有这些都在Debian 9中）。 I wonder why I have not found such a solution before... 我想知道为什么以前没有找到这样的解决方案...

%python
from sqlalchemy import create_engine
import pandas as pd

sql = "select col1, col2 from table limit 10"
df = pd.read_sql(sql,
create_engine('mysql+mysqldb://user:password@host:3306/database').connect())

z.show(df)

If you want to connect to another database like db2 or oracle, you have to use other python packages and adjust the first part in the create_engine string. 如果要连接到另一个数据库，例如db2或oracle，则必须使用其他python软件包并调整create_engine字符串中的第一部分。

Zeppelin：用SQL查询数据并使用它的最佳方法是什么？

问题描述

2 个解决方案

解决方案1
1 2017-07-17 10:44:20

解决方案2
0 已采纳 2017-07-12 11:52:06

Zeppelin：用SQL查询数据并使用它的最佳方法是什么？

问题描述

2 个解决方案

解决方案1 1 2017-07-17 10:44:20

解决方案2 0 已采纳 2017-07-12 11:52:06

解决方案1
1 2017-07-17 10:44:20

解决方案2
0 已采纳 2017-07-12 11:52:06