简体   繁体   English

如何在使用 python 的 Spark SQL 作业中避免 \r\n?

[英]How to avoid \r\n in Spark SQL Jobs using python?

I have hundreds of yaml files with Hive Query that I am migrating to SparkSQL using a python script I wrote.我有数百个 yaml 文件和 Hive 查询,我正在使用我编写的 python 脚本迁移到SparkSQL My goal was to have SprkSQL query that is properly formatted so I kept tabs( \t ), spaces, and new_line( \n ) characters in my SparkSQL queries.我的目标是让SprkSQL查询格式正确,因此我在SparkSQL查询中保留了制表符( \t )、空格和新行( \n )字符。

The problem is when I submit this code I get following error (image).问题是当我提交此代码时,我收到以下错误(图片)。 I am able to fix this by replacing \r\n with white space, but that impacts formatting as entire code will be in single line.我可以通过用空格替换\r\n来解决这个问题,但这会影响格式,因为整个代码将在单行中。 I am looking for some robust way to deal with \r\n in my code without impacting the formatting.我正在寻找一些可靠的方法来处理我的代码中的\r\n而不会影响格式。

在此处输入图像描述

My workarounds:我的解决方法:

  1. When I replace \r\n characters with space then it is working fine but becomes unformatted.当我用空格替换\r\n字符时,它工作正常,但没有格式化。
  2. When I use tr -d '\r' < input > output then get error for \n as below当我使用tr -d '\r' < input > output然后得到如下错误 \n
 Parsing Error [line 5]: '(\n' [line 46]: ')\n'

I am spending lots of time manually debug each files and looking for some idea that can automate my process.我花费大量时间手动调试每个文件并寻找一些可以自动化我的过程的想法。

use \ to show continuation of text in next line使用 \ 在下一行显示文本的延续

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM