简体   繁体   English

在Scala的Spark数据框中用另一个定界符替换

[英]Replace one delimiter with another in Spark data frame in Scala

Here is my Spark data frame output of dfMainOutput . 这是dfMainOutput的 Spark数据帧输出。

4295858898,177,SelfSourcedPublic,INC,Cost of sales,Umsatzkosten,,ECOR,false,,,,,false,False,,,,505096,505074,505074,505096,505096,,505074,False,,3014830,,I|!|

Now i want to replace , with |^| 现在我要替换,|^| and dropping one column DataPartition 并删除一列DataPartition

This is what I am doing: 这就是我在做什么:

val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

val headerColumn = df.columns.filter(v => (!v.contains("^") && !v.contains("_c"))).toSeq

val header = headerColumn.dropRight(1).mkString("", "|^|", "|!|")

val dfMainOutputFinalWithoutNull = dfMainOutputFinal.withColumn("concatenated", regexp_replace(col("concatenated"), "null", "")).withColumnRenamed("concatenated", header)


dfMainOutputFinalWithoutNull.repartition(1).write.partitionBy("DataPartition","StatementTypeCode")
  .format("csv")
  .option("nullValue", "")
  .option("header", "true")
  .option("codec", "gzip")
  .save("s3://trfsmallfffile/FinancialLineItem/output")

With this code the output below is generated: 使用此代码,将生成以下输出:

4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^|ECOR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3014830|^|I|!|

Where empty elements are missing. 空元素缺失的地方。 I would like it to be: 我希望它是:

4295858898|^|177|^|INC|^|Cost of sales|^|Umsatzkosten|^||^|ECOR|^|False|^||^||^||^||^|False|^|False|^||^||^||^|505096|^|505074|^|505074|^|505096|^|505096|^||^|505074|^|False|^||^|3014830|^||^|I|!|

Also in the data frame output i am getting false where as we want False Please help me what II am missing .. 同样在数据帧输出中,我变得false ,因为我们要False

Here is my schema 这是我的图式

root
 |-- LineItem_organizationId: long (nullable = true)
 |-- LineItem_lineItemId: integer (nullable = true)
 |-- DataPartition: string (nullable = true)
 |-- StatementTypeCode: string (nullable = true)
 |-- LineItemName: string (nullable = true)
 |-- LocalLanguageLabel: string (nullable = true)
 |-- FinancialConceptLocal: string (nullable = true)
 |-- FinancialConceptGlobal: string (nullable = true)
 |-- IsDimensional: boolean (nullable = true)
 |-- InstrumentId: string (nullable = true)
 |-- LineItemSequence: string (nullable = true)
 |-- PhysicalMeasureId: string (nullable = true)
 |-- FinancialConceptCodeGlobalSecondary: string (nullable = true)
 |-- IsRangeAllowed: boolean (nullable = true)
 |-- IsSegmentedByOrigin: string (nullable = true)
 |-- SegmentGroupDescription: string (nullable = true)
 |-- SegmentChildDescription: string (nullable = true)
 |-- SegmentChildLocalLanguageLabel: string (nullable = true)
 |-- LocalLanguageLabel_languageId: string (nullable = true)
 |-- LineItemName_languageId: string (nullable = true)
 |-- SegmentChildDescription_languageId: string (nullable = true)
 |-- SegmentChildLocalLanguageLabel_languageId: string (nullable = true)
 |-- SegmentGroupDescription_languageId: string (nullable = true)
 |-- SegmentMultipleFundbDescription: string (nullable = true)
 |-- SegmentMultipleFundbDescription_languageId: string (nullable = true)
 |-- IsCredit: string (nullable = true)
 |-- FinancialConceptLocalId: string (nullable = true)
 |-- FinancialConceptGlobalId: string (nullable = true)
 |-- FinancialConceptCodeGlobalSecondaryId: string (nullable = true)
 |-- FFAction: string (nullable = true)

In the data DataPartition=SelfSourcePublic and StatementTypeCode=INC 在数据中DataPartition=SelfSourcePublic and StatementTypeCode=INC

Output from dfMaainOutput dfMaainOutput的输出

+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+
|LineItem_organizationId|LineItem_lineItemId|DataPartition    |StatementTypeCode|LineItemName                                                                              |LocalLanguageLabel                                                                                    |FinancialConceptLocal|FinancialConceptGlobal|IsDimensional|InstrumentId|LineItemSequence|PhysicalMeasureId|FinancialConceptCodeGlobalSecondary|IsRangeAllowed|IsSegmentedByOrigin|SegmentGroupDescription|SegmentChildDescription|SegmentChildLocalLanguageLabel|LocalLanguageLabel_languageId|LineItemName_languageId|SegmentChildDescription_languageId|SegmentChildLocalLanguageLabel_languageId|SegmentGroupDescription_languageId|SegmentMultipleFundbDescription|SegmentMultipleFundbDescription_languageId|IsCredit|FinancialConceptLocalId|FinancialConceptGlobalId|FinancialConceptCodeGlobalSecondaryId|FFAction|
+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+
|4295858898             |707                |SelfSourcedPublic|INC              |Revenue from long-term construction contracts                                             |Erlöse aus langfristigen Fertigungsaufträgen                                                          |null                 |ROBR                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3015278                 |null                                 |I|!|    |
|4295858898             |3289               |SelfSourcedPublic|INC              |Balancing Item - Net Income available to Controlling Interest                             |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295858922             |808                |SelfSourcedPublic|INC              |Income Taxes - Total                                                                      |Ertragsteuern                                                                                         |null                 |XTAX                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |False   |null                   |3019589                 |null                                 |I|!|    |
|4295858922             |1507               |SelfSourcedPublic|INC              |Balancing Item - Operating Expenses                                                       |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295858951             |1574               |SelfSourcedPublic|INC              |Admin/General Expenses                                                                    |null                                                                                                  |null                 |ESGA                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018991                 |null                                 |I|!|    |
|4295859007             |1645               |SelfSourcedPublic|INC              |Exploration Expenses - Balancing value                                                    |null                                                                                                  |null                 |EEXP                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018916                 |null                                 |I|!|    |
|4295859038             |954                |SelfSourcedPublic|INC              |Sale Investments                                                                          |null                                                                                                  |null                 |EGFA                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018929                 |null                                 |I|!|    |
|4295859038             |1967               |SelfSourcedPublic|INC              |Restructuring Charges/Provisions                                                          |Ergebnis aus Umstrukturierungen                                                                       |null                 |ERES                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |False   |null                   |3018980                 |null                                 |I|!|    |
|4295859038             |1996               |SelfSourcedPublic|INC              |Diluted Weighted Average Shares on Instrument Level multiplied to its Participation Factor|null                                                                                                  |null                 |DWASEPFI              |false        |8590926849  |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |1002023919              |null                                 |I|!|    |
|4295859045             |864                |SelfSourcedPublic|INC              |Results of valuation gains/losses and disposals of non-current securities                 |Ergebnis aus Kursänderungen und Abgängen von Wertpapieren des langfristigen Finanzvermögens („@FVTPL“)|null                 |EGIT                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018932                 |null                                 |I|!|    |
|4295859045             |1092               |SelfSourcedPublic|INC              |Excep. Depreciation                                                                       |null                                                                                                  |null                 |EGLO                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3018938                 |null                                 |I|!|    |
|4295859071             |1840               |SelfSourcedPublic|INC              |Other Operating Expense                                                                   |null                                                                                                  |null                 |EOOE                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018974                 |null                                 |I|!|    |
|4295859078             |914                |SelfSourcedPublic|INC              |Balancing Item - Non Operating Income/(Expense), net                                      |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505096                       |505074                 |505074                            |505096                                   |505096                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295859106             |514                |SelfSourcedPublic|INC              |Personnel Expenses                                                                        |null                                                                                                  |null                 |ELAS                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |False   |null                   |3018944                 |null                                 |I|!|    |
|4295859106             |903                |SelfSourcedPublic|INC              |Balancing Item - Non Operating Income/(Expense), net                                      |null                                                                                                  |null                 |IIII                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3014960                 |null                                 |I|!|    |
|4295859216             |499                |SelfSourcedPublic|INC              |BC - Depreciation of Fixed Assets                                                         |null                                                                                                  |null                 |BCDEP                 |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505084                       |505074                 |505074                            |505084                                   |505084                            |null                           |505074                                    |False   |null                   |1002023928              |null                                 |I|!|    |
|4295859236             |172                |SelfSourcedPublic|INC              |Total Revenue                                                                             |Ventes                                                                                                |null                 |XTLR                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3016345                 |null                                 |I|!|    |
|4295859241             |492                |SelfSourcedPublic|INC              |Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level)        |null                                                                                                  |null                 |XNCNDI                |false        |8589989623  |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |1001214357              |null                                 |I|!|    |
|4295859279             |124                |SelfSourcedPublic|INC              |Income Available to Com Excl ExtraOrd                                                     |Toerekenbaar aan de aandeelhouders van de moederonderneming                                           |null                 |XNCN                  |false        |null        |null            |null             |null                               |false         |False              |null                   |null                   |null                          |505084                       |505074                 |505074                            |505084                                   |505084                            |null                           |505074                                    |True    |null                   |3016316                 |null                                 |I|!|    |
|4295859298             |488                |SelfSourcedPublic|INC              |Other operating income/expenses                                                           |Other operating expenses                                                                              |null                 |EOIE                  |false        |null        |null            |null             |null                               |false         |null               |null                   |null                   |null                          |505074                       |505074                 |505074                            |505074                                   |505074                            |null                           |505074                                    |True    |null                   |3018969                 |null                                 |I|!|    |
+-----------------------+-------------------+-----------------+-----------------+------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+---------------------+----------------------+-------------+------------+----------------+-----------------+-----------------------------------+--------------+-------------------+-----------------------+-----------------------+------------------------------+-----------------------------+-----------------------+----------------------------------+-----------------------------------------+----------------------------------+-------------------------------+------------------------------------------+--------+-----------------------+------------------------+-------------------------------------+--------+

Output After Code 代码输出

val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

Here is output 这是输出

+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|DataPartition    |StatementTypeCode|concatenated                                                                                                                                                                                                                                                                                                            |
+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|SelfSourcedPublic|INC              |4295858898|^|707|^|INC|^|Revenue from long-term construction contracts|^|Erlöse aus langfristigen Fertigungsaufträgen|^|ROBR|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3015278|^|I|!|                                                                                      |
|SelfSourcedPublic|INC              |4295858898|^|3289|^|INC|^|Balancing Item - Net Income available to Controlling Interest|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!|                                                                                                                            |
|SelfSourcedPublic|INC              |4295858922|^|808|^|INC|^|Income Taxes - Total|^|Ertragsteuern|^|XTAX|^|false|^|false|^|False|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|False|^|3019589|^|I|!|                                                                                                                                             |
|SelfSourcedPublic|INC              |4295858922|^|1507|^|INC|^|Balancing Item - Operating Expenses|^|IIII|^|false|^|false|^|505096|^|505074|^|505074|^|505096|^|505096|^|505074|^|True|^|3014960|^|I|!|                                                                                                                                                      |
|SelfSourcedPublic|INC              |4295859236|^|172|^|INC|^|Total Revenue |^|Ventes|^|XTLR|^|false|^|false|^|False|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3016345|^|I|!|                                                                                                                                                           |
|SelfSourcedPublic|INC              |4295859241|^|492|^|INC|^|Diluted Net Income excluding Extra Items applicable to Common - (Instrument Level) |^|XNCNDI|^|false|^|8589989623|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|1001214357|^|I|!|                                                                                     |
|SelfSourcedPublic|INC              |4295859279|^|124|^|INC|^|Income Available to Com Excl ExtraOrd|^|Toerekenbaar aan de aandeelhouders van de moederonderneming|^|XNCN|^|false|^|false|^|False|^|505084|^|505074|^|505074|^|505084|^|505084|^|505074|^|True|^|3016316|^|I|!|                                                                               |
|SelfSourcedPublic|INC              |4295859298|^|488|^|INC|^|Other operating income/expenses|^|Other operating expenses|^|EOIE|^|false|^|false|^|505074|^|505074|^|505074|^|505074|^|505074|^|505074|^|True|^|3018969|^|I|!|                                                                                                                                |
+-----------------+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

So just be make clear for example 4295858898 |3289 null value in the column LocalLanguageLabel is gone 所以,仅仅是搞清楚例如4295858898 |3289列空值LocalLanguageLabel是走了

I dont know how it is missed ... 我不知道怎么错过...

Your main culprit is that you have null values in your dataframe and concat_ws is filtering out all the null values. 你的主要元凶就是你null在值dataframeconcat_ws被过滤掉所有空值。 So the solution is to replace all null values to "" which should solve your issue. 因此解决方案是将所有null值替换为"" ,这应该可以解决您的问题。 It won't be a problem as you have all the null datatypes in schema as string . 这不会有问题,因为模式中所有的null数据类型都为string

So, replacing the following 因此,替换以下内容

val dfMainOutputFinal = dfMainOutput.select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

with

val dfMainOutputFinal = dfMainOutput.na.fill("").select($"DataPartition", $"StatementTypeCode",concat_ws("|^|", dfMainOutput.schema.fieldNames.filter(_ != "DataPartition").map(c => col(c)): _*).as("concatenated"))

should solve your issue 应该解决你的问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM