简体   繁体   English

如何使用 pyspark 将两列值组合到另一列?

[英]How to combine two columns values to another column using pyspark?

This is the code I'm using to map values from a csv to a table in sql in aws glue.这是我用于 map 值的代码,从 csv 到 aws 胶水中 sql 中的表。

mappings=[
        ("houseA", "string", "villa", "string"),
        ("houseB", "string", "small_house", "string"),
        ("houseA"+"houseB", "string", "combined_key", "string"),
    ],

I find no issue with mapping houseA and houseB to "villa" and "small_house" columns respectively.我发现将houseA 和houseB 分别映射到“villa”和“small_house”列没有问题。 But when I try to have houseAhouseB in "combined_key" column it is giving me this error.但是当我尝试在“combined_key”列中有houseAhouseB时,它给了我这个错误。

An error occurred while calling o128.pyWriteDynamicFrame.调用 o128.pyWriteDynamicFrame 时出错。 Cannot insert the value NULL into column 'combined_key', table 'dbo.Buildings';无法将值 NULL 插入到列“combined_key”、表“dbo.Buildings”中; column does not allow nulls.列不允许空值。 INSERT fails.插入失败。

I couldn't quite figure out why it is giving back a null error.我不太明白为什么它会返回 null 错误。

Any ideas on how the code can be modified?关于如何修改代码的任何想法?

Thanks in advance.提前致谢。

I actually had found that there is a custom transformation available in glue studio where we can achieve this using pyspark code实际上,我发现胶水工作室中有一个自定义转换可用,我们可以使用 pyspark 代码实现此目的

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何组合两列的浮点值并将其放入 dataframe 的另一列中? - How do I combine the float values of two columns and put it in an another column of my dataframe? 如果两个数据列值在另一个数据框中,如何在pyspark中添加一列? - How to add a column in pyspark if two column values is in another dataframe? 有没有办法将具有列表值的两列组合成一列,其中包含 pyspark dataframe 的列表值 - Is there a way to combine two columns having list values into one column with list value for a pyspark dataframe 将两列数据框的值合并为一列 - Combine values of two columns of dataframe into one column 如何获取具有Pyspark Dataframe的另一列中给出的多个列的值的列表列? - How to get a list column with values of multiple columns given in another column in Pyspark Dataframe? 在两列上使用 loc 来执行替换另一列值的计算 - Using loc on two columns to perform calculations that replace values of another column 将结构类型列分解为 pyspark 中的两列键和值 - Exploding struct type column to two columns of keys and values in pyspark 如果 pyspark dataframe 基于两列中的值在另一个 dataframe 中,如何删除它们的行? - How to drop rows of a pyspark dataframe if they're in another dataframe based on the values from two columns? 如何将pyspark数据帧列中的值与pyspark中的另一个数据帧进行比较 - How to compare values in a pyspark dataframe column with another dataframe in pyspark 如何有效地将两列组合成一列/组合字符串? - How to efficiently combine two columns into one column/ combine strings?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM