[英]Zeppelin: What the best way to query data with SQL and work with it?
I want to use Zeppelin to query databases. 我想使用Zeppelin查询数据库。 I currently see two possibilities but none of them is sufficient for me: 我目前看到两种可能性,但它们对我来说都不足够:
You can use Zeppelin API to retrieve paragraph data: 您可以使用Zeppelin API检索段落数据:
val buffer = scala.io.Source.fromURL("http://XXXXX:9995/api/notebook/2CN2QP93H/paragraph/20170713-092810_1633770798").mkString
val df = sqlContext.read.json(sc.parallelize(buffer :: Nil)).select("body.text")
df.first.getAs[String](0)
This Spark Scala lines will retrieve the SQL query used by a paragprah. 此Spark Scala行将检索paragprah使用的SQL查询。 You could do same thing to get results I think. 我认为您可以做同样的事情来获得结果。
I cannot find a solution for 1. But I have made a short solution for 2. that works within zeppelin with python (2.7), sqlalchemy (sql wrapper), mysqldb (mysql implementation) and pandas (make sure that have these packages installed, all of them are in Debian 9). 我找不到1的解决方案。但是我为2做了一个简短的解决方案,该解决方案可在zeppelin中与python(2.7),sqlalchemy(sql包装器),mysqldb(mysql实现)和pandas(确保已安装这些软件包,所有这些都在Debian 9中)。 I wonder why I have not found such a solution before... 我想知道为什么以前没有找到这样的解决方案...
%python
from sqlalchemy import create_engine
import pandas as pd
sql = "select col1, col2 from table limit 10"
df = pd.read_sql(sql,
create_engine('mysql+mysqldb://user:password@host:3306/database').connect())
z.show(df)
If you want to connect to another database like db2 or oracle, you have to use other python packages and adjust the first part in the create_engine string. 如果要连接到另一个数据库,例如db2或oracle,则必须使用其他python软件包并调整create_engine字符串中的第一部分。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.