简体   繁体   中英

How to avoid \r\n in Spark SQL Jobs using python?

I have hundreds of yaml files with Hive Query that I am migrating to SparkSQL using a python script I wrote. My goal was to have SprkSQL query that is properly formatted so I kept tabs( \t ), spaces, and new_line( \n ) characters in my SparkSQL queries.

The problem is when I submit this code I get following error (image). I am able to fix this by replacing \r\n with white space, but that impacts formatting as entire code will be in single line. I am looking for some robust way to deal with \r\n in my code without impacting the formatting.

在此处输入图像描述

My workarounds:

  1. When I replace \r\n characters with space then it is working fine but becomes unformatted.
  2. When I use tr -d '\r' < input > output then get error for \n as below
 Parsing Error [line 5]: '(\n' [line 46]: ')\n'

I am spending lots of time manually debug each files and looking for some idea that can automate my process.

use \ to show continuation of text in next line

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM