![](/img/trans.png)
[英]How do I add a new column to a Spark DataFrame (using PySpark)?
[英]How can I add column from a spark dataframe in my spark dataframe(Using Pyspark)?
我有两个 spark 数据帧,我想将一列从一个 spark 数据帧添加到另一个。
我的代码是:
new = df.withColumn("prob", tr_df.prob)
在这里,我想将tr_df 中的列result2添加到名为prob 的数据帧df 中。 我搜索了这个,但没有任何效果对我有用,我收到了一个错误——
AnalysisException: u'resolved attribute(s) prob#579 missing from q1_n_words#388L,prediction#510,res1#390,q2_n_words#389L,tfidf_word_match#384,Average#379,prob#385,probability#485,Cosine#381,word_m#383,rawPrediction#461,features#438,res2#391,question1#373,Jaccard#382,test_id#372L,raw_pred#377,question2#374,q2len#376,Common#378L,result2#387,q1len#375,result1#386,Percentage#380 in operator !Project [test_id#372L, question1#373, question2#374, q1len#375, q2len#376, raw_pred#377, Common#378L, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, tfidf_word_match#384, prob#579 AS prob#634, result1#386, result2#387, q1_n_words#388L, q2_n_words#389L, res1#390, res2#391, features#438, rawPrediction#461, probability#485, prediction#510];;\n!Project [test_id#372L, question1#373, question2#374, q1len#375, q2len#376, raw_pred#377, Common#378L, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, tfidf_word_match#384, prob#579 AS prob#634, result1#386, result2#387, q1_n_words#388L, q2_n_words#389L, res1#390, res2#391, features#438, rawPrediction#461, probability#485, prediction#510]\n+- Project [test_id#372L, question1#373, question2#374, q1len#375, q2len#376, raw_pred#377, Common#378L, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, tfidf_word_match#384, prob#385, result1#386, result2#387, q1_n_words#388L, q2_n_words#389L, res1#390, res2#391, features#438, rawPrediction#461, probability#485, UDF(rawPrediction#461) AS prediction#510]\n +- Project [test_id#372L, question1#373, question2#374, q1len#375, q2len#376, raw_pred#377, Common#378L, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, tfidf_word_match#384, prob#385, result1#386, result2#387, q1_n_words#388L, q2_n_words#389L, res1#390, res2#391, features#438, rawPrediction#461, UDF(rawPrediction#461) AS probability#485]\n +- Project [test_id#372L, question1#373, question2#374, q1len#375, q2len#376, raw_pred#377, Common#378L, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, tfidf_word_match#384, prob#385, result1#386, result2#387, q1_n_words#388L, q2_n_words#389L, res1#390, res2#391, features#438, UDF(features#438) AS rawPrediction#461]\n +- Project [test_id#372L, question1#373, question2#374, q1len#375, q2len#376, raw_pred#377, Common#378L, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, tfidf_word_match#384, prob#385, result1#386, result2#387, q1_n_words#388L, q2_n_words#389L, res1#390, res2#391, UDF(struct(q1len#375, q2len#376, cast(q1_n_words#388L as double) AS q1_n_words_double_VectorAssembler_4158baa8e5b4f3aced2b#435, cast(q2_n_words#389L as double) AS q2_n_words_double_VectorAssembler_4158baa8e5b4f3aced2b#436, cast(Common#378L as double) AS Common_double_VectorAssembler_4158baa8e5b4f3aced2b#437, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, prob#385, raw_pred#377, res1#390, res2#391)) AS features#438]\n +- LogicalRDD [test_id#372L, question1#373, question2#374, q1len#375, q2len#376, raw_pred#377, Common#378L, Average#379, Percentage#380, Cosine#381, Jaccard#382, word_m#383, tfidf_word_match#384, prob#385, result1#386, result2#387, q1_n_words#388L, q2_n_words#389L, res1#390, res2#391]\n'
tr_df 架构 --
tr_df.printSchema()
root
|-- prob: float (nullable = true)
df 架构——
df.printSchema()
root
|-- test_id: long (nullable = true)
请帮忙! 提前致谢。
由于错误消息明确指出您需要将 spark.sql.crossJoin.enabled = true 设置为您的 spark 配置
您可以设置如下相同的内容:
val sparkConf = new SparkConf().setAppName("Test")
sparkConf.set("spark.sql.crossJoin.enabled", "true")
然后通过传递这个 SparkConf 来获取或创建 SparkSession
val sparkSession = SparkSession.builder().config(sparkConf).getOrCreate()
然后加入...
在 pyspark 中,您可以通过以下方式执行此操作。 希望它会很有用。
>>> spark.conf.set("spark.sql.crossJoin.enabled", True)
>>> df1.show()
+----+
|col1|
+----+
| 23|
| 56|
| 78|
| 31|
+----+
>>> df2.show()
+----+
|col2|
+----+
| 87|
| 45|
| 23|
| 11|
+----+
>>> final = df1.crossJoin(df2)
>>> final.withColumnRenamed('col2', 'result').show()
+----+------+
|col1|result|
+----+------+
| 23| 87|
| 23| 45|
| 23| 23|
| 23| 11|
| 56| 87|
| 56| 45|
| 56| 23|
| 56| 11|
| 78| 87|
| 78| 45|
| 78| 23|
| 78| 11|
| 31| 87|
| 31| 45|
| 31| 23|
| 31| 11|
+----+------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.