简体繁体中英

Why is my Code Repo warning me not to use union and instead use unionByName?

原文 2022-01-18 14:08:24 1 1 palantir-foundry/ foundry-code-repositories/ foundry-python-transform

I see in my repository it's warning me about using union and instead I should use unionByName . Aren't these the same thing? Why would I care which one to use?

1 answers

In PySpark docs it's noted that for union :

Also as standard in SQL, this function resolves columns by position (not by name).

This is dangerous is most cases as if your schemas have the same types but not the same names / purposes, you may silently be merging different and incompatible schemas. ie if schema1 is [('col1', T.IntegerType()), ('col2', T.StringType())] and schema2 is [('col3', T.IntegerType()), ('col4', T.StringType())] , they can successfully be merged via union even though col1 and col3 have fundamentally different meanings, as may col2 and col4

This is different from unionByName , in that:

The difference between this function and union() is that this function resolves columns by name (not by position)

This is a safer way to conduct a union in most cases, therefore it is preferred.

Why is my Code Repo warning me about using withColumn in a for/while loop?

Why should I not use collect() in my Python Transforms?

How can I use functions defined in other Code Repository?

Code Repository - What exactly is CTX in pyspark for a code repo?

Why don't I see log lines in my PySpark code when I would expect them to appear?

How do I union two datasets in Palantir Foundry within a code workbook?

How do I use a local IDE for Java Transforms in Foundry Code Repositories?

How do I generate authentication tokens within Foundry Code Workbook to use as an argument to APIs?

How do I make my many-join / many-union datasets compute faster?

How do I improve the performance of my Code Assist in Code Repository?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why is my Code Repo warning me about using withColumn in a for/while loop? Why should I not use collect() in my Python Transforms? How can I use functions defined in other Code Repository? Code Repository - What exactly is CTX in pyspark for a code repo? Why don't I see log lines in my PySpark code when I would expect them to appear? How do I union two datasets in Palantir Foundry within a code workbook? How do I use a local IDE for Java Transforms in Foundry Code Repositories? How do I generate authentication tokens within Foundry Code Workbook to use as an argument to APIs? How do I make my many-join / many-union datasets compute faster? How do I improve the performance of my Code Assist in Code Repository?

Related Tags

Why is my Code Repo warning me not to use union and instead use unionByName?

Question

1 answers

solution1 2 2022-01-18 14:08:24

solution1
2 2022-01-18 14:08:24