繁体   English   中英

如何使用 spark scala 从具有 NESTED 标记的文本文件生成 xml 文件。 请给我一些想法

[英]How to generate xml file from a textfile which has NESTED tags by using spark scala. Please give me some idea

以下是我拥有的示例输入文件:

johnson|26|M|6|BTECH|Acollege|a

RICHARD|27|M|7|BTECH|bcollege|a

形成这个 csv 输入,我需要生成一个格式如下的 xml 文件:

<details>
    <employee_details>
        <personal_details>
             <name>Johnson</name>
             <age>26</age>   
             <gender>M</gender>
             <height>6</height>
        </personal_details>
        <education_details>
            <degree>BTECH</degree>
            <college_name>ACOLLEGE</college_name>
            <grade>a</grade>
        </education_details>
    </employee_details>
    <employee_details>
        <personal_details>
             <name>RICHARD</name>
             <age>27</age>   
             <gender>M</gender>
             <height>7</height>
        </personal_details>
        <education_details>
            <degree>BTECH</degree>
            <college_name>BCOLLEGE</college_name>
            <grade>a</grade>
        </education_details>
    </employee_details>

请在这件事上给予我帮助......

您可以使用spark-xml以 XML 格式编写输出,以下是您案例的简单示例

为 Maven 添加依赖项

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>spark-xml_2.11</artifactId>
    <version>0.4.1</version>
</dependency>

SBT 的依赖

// https://mvnrepository.com/artifact/com.databricks/spark-xml
libraryDependencies += "com.databricks" %% "spark-xml" % "0.3.1"

原始数据

  val df = Seq(
    ("johnson", "26", "M", "BTECH", "Acollege", "a"),
    ("RICHARD", "27", "M", "BTECH", "bcollege", "a")
  ).toDF("name", "age", "gender", "degree", "college_name", "grade")

  val resultDF = df.withColumn("personal_details", struct("name", "age", "gender"))
    .withColumn("education_details", struct("degree", "college_name", "grade"))
    .select("personal_details.*", "education_details.*")


  resultDF.write
    .format("com.databricks.spark.xml")
    .option("rootTag", "details")
    .option("rowTag", "employee_details")
    .save("outputtttttt/test.xml")

输出:

<details>
    <employee_details>
        <personal_details>
            <name>johnson</name>
            <age>26</age>
            <gender>M</gender>
        </personal_details>
        <education_details>
            <degree>BTECH</degree>
            <college_name>Acollege</college_name>
            <grade>a</grade>
        </education_details>
    </employee_details>
    <employee_details>
        <personal_details>
            <name>RICHARD</name>
            <age>27</age>
            <gender>M</gender>
        </personal_details>
        <education_details>
            <degree>BTECH</degree>
            <college_name>bcollege</college_name>
            <grade>a</grade>
        </education_details>
    </employee_details>
</details>

希望这可以帮助!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM