简体   繁体   中英

How to combine two columns values to another column using pyspark?

This is the code I'm using to map values from a csv to a table in sql in aws glue.

mappings=[
        ("houseA", "string", "villa", "string"),
        ("houseB", "string", "small_house", "string"),
        ("houseA"+"houseB", "string", "combined_key", "string"),
    ],

I find no issue with mapping houseA and houseB to "villa" and "small_house" columns respectively. But when I try to have houseAhouseB in "combined_key" column it is giving me this error.

An error occurred while calling o128.pyWriteDynamicFrame. Cannot insert the value NULL into column 'combined_key', table 'dbo.Buildings'; column does not allow nulls. INSERT fails.

I couldn't quite figure out why it is giving back a null error.

Any ideas on how the code can be modified?

Thanks in advance.

I actually had found that there is a custom transformation available in glue studio where we can achieve this using pyspark code

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM