简体   繁体   中英

Using the columns of one dataframe in another dataframe

I am trying to read the data from one dataframe and use it in another. How can I do it gracefully?

val query = s"select distinct p_id, lower(regexp_replace(p_id,'[^a-zA-Z0-9]+','_')) as p_id_formatted, lower(regexp_extract(f_id,'^([^\\.]+)\\.?',1)) as f_id_formatted, column_name from default.rc_pcoders"
val run_query = sql(query)
val table_name = run_query.select(concat(lit("nepp"), lit("_"),$"p_id_formatted", lit("_") ,$"f_id_formatted ").alias("tablename"),$"column_name")

This gives me below output, which essentially represents a tablename

+------------------+-----------+
|tablename         |column_name|
+------------------+-----------+
|nepp_148hl16011_cm|cmtrt      |
|nepp_148hl16011_mh|mhaspe     |
|nepp_148hl16011_ae|aeputt     |
+------------------+-----------+

How can I get the column names from each of these tables? Something like (below query doesn't work though)

val whole_query = sql("show columns in "table_name.tablename"")

First, collect all the names of the tables to load:

val tableNames = df.collect().map(row => row.getAs[String]("tablename")).toSeq

Second, get the references to the respective DataFrames, associate them with their column names

val sqlCtx: SQLContext = // your SQL context ref
val dfToColumns = tableNames.map(table => {
  val columnNames = sqlCtx.table(table).schema.fieldNames.toSeq
  (table, columnNames)
}).toMap

dfToColumns is a Map[String, Seq[String]] with DataFrame names as keys and Seqs of their respective column names as values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM