通过在 R 中添加其他单元格和行的值来扩展 Dataframe 中的值

Question

I try to analyze XML-Data in R with dplyr and ggplot2.我尝试使用 dplyr 和 ggplot2 在 R 中分析 XML 数据。 My code is able to transform the XML data into a data frame.我的代码能够将 XML 数据转换为数据框。 Unfortunately the structure gets lost.不幸的是，结构丢失了。

My XML-document have following structure by example:我的 XML 文档具有以下示例结构：

<?xml version="1.0" encoding="UTF-8"?>
<Budget price="1234" items="1234" year="1990">
<Account name="a" value="123" step="0">
<Account name="1" value="12" step="1"/>
<Account name="1.1" value="12" step="2"/>
<Account name="2" value="12" step="1"/>
<Account name="2.1" value="9" step="2"/>
<Account name="2.2" value="3" step="2"/>
<Account name="3" value="99" step="1"/>
<Account name="3.1" value="78" step="2"/>
<Account name="3.1.1" value="70" step="3"/>
<Account name="3.1.2" value="8" step="3"/>
<Account name="3.2" value="21" step)="2"/>
</Account>
<Account name="b" value="234" step="0">
<Account name="1" value="200" step="1"/>

and so on等等

At first I save all values:起初我保存所有值：

budget_values = xml_find_all(doc,"//Budget",flatten=FALSE)

Afterwards I select some of the values:然后我选择一些值：

step_ids = purrr::map_chr(budget_values, ~xml_attr(.,"step"))
name_values = purrr::map_chr(budget_values, ~xml_attr(.,"name"))
values = purrr::map_chr(budget_values, ~xml_attr(.,"value"))

Save attributes in a combined list:将属性保存在组合列表中：

values_list <- list((step_ids),(name_values),(values))

And convert it into a data frame:并将其转换为数据框：

budget_df <- data.frame(sapply(values_list, c))

That works great.效果很好。 I got an DF like this:我有一个这样的 DF：

Step-ID步骤ID	name姓名	vlaue值
0 0	a一种	1234 1234
1 1个	1 1个	12 12
2 2个	1.1 1.1	12 12
1 1个	2 2个	12 12
2 2个	2.1 2.1	9 9
2 2个	2.2 2.2	3 3个
1 1个	3 3个	99 99
0 0	b b	234 234
1 1个	1 1个	200 200

and so on等等

As you see from the example some names are repeated - usually step 1 and 2;正如您从示例中看到的，一些名称是重复的——通常是第 1 步和第 2 步； step 3 is usually very unique.第 3 步通常非常独特。

My aim is following dataframe to analyze the data more structured.我的目标是跟随数据框来分析更有条理的数据。

Step-ID步骤ID	name姓名	vlaue值
0 0	a一种	1234 1234
1 1个	a1 a1	12 12
2 2个	a1.1 a1.1	12 12
1 1个	a2 a2	12 12
2 2个	a2.1 a2.1	9 9
2 2个	a2.2 a2.2	3 3个
1 1个	a3 a3	99 99
0 0	b b	234 234
1 1个	b1 b1	200 200

and so on等等

For example: I want the values of all step1.例如：我想要所有 step1 的值。 Now I can't tell from which budget it is.现在我不知道它来自哪个预算。 With the new name I can see: this value is from budget a, this one from budget b and so on.通过新名称，我可以看到：这个值来自预算 a，这个来自预算 b，依此类推。

I tried following for-loop and stored the result in a new dataframe我尝试使用 for-loop 并将结果存储在一个新的数据框中

df<-for (rows in budget_df) {
  if (rows$`Step-ID` == "0") {
    saved_name <- rows$name
    print(saved_name)
  }
  else
    (rows$`Step-ID` == "1"){
      rows$Haushalt+saved_name
      saved_names<-saved_name+rows$name
      print(saved_names)
    }
  else(rows$`Step-ID`=="2"){
    rows$Haushalt+saved_name
  }
  else(rows$`Step-ID`=="3"){
    rows$name+saved_names
  }
}
View(df)

And I get following Error:我得到以下错误：

Error: unexpected '{' in:
"  else
    (rows$`Step-ID` == "1"){"

My questions is: Is there a better way to analyze the data or rename the values in name?我的问题是：是否有更好的方法来分析数据或重命名名称中的值？

Thank you very much for your help!非常感谢您的帮助！

Update:更新：

Thanks again to @jpsmith.再次感谢@jpsmith。 I tried following code regarding to his recommondation:我尝试了以下关于他的推荐的代码：

df-budget_df

    budget <- ""
    df <- for (row in df) {
      mutate(
        case_when (
          df$`Step-ID` == "0" ~ budget <- df$Haushalt,
          df$`Step-ID` == "2" ~ mutate(df, sturucture = paste(budget, df$Haushalt)),
          df$`Step-ID` == "2" ~ budget <- c(budget, df$Haushalt),
          df$`Step-ID` == "3" ~ mutate(df, sturucture = paste(values, df$Haushalt))
        )
      )
    }

Explains logically, what I want to do, but doesn't work.从逻辑上解释我想做什么，但不起作用。 I think, it's because of trying to store the value with <- ?我认为，这是因为试图用<-存储值？ I couldn't find another way at ?case_when to store values.我在?case_when找不到另一种方法来存储值。 Another code (I have overwritten) stores the value of Step-ID and extended the value of Haushalt of the same step, instead of: Step-ID 0 to Haushalt with Step-ID 1 under Step-ID 0 and Step-ID 1 to Haushalt with Step-ID 2 .另一个代码（我已经覆盖）存储了Step-ID的值并扩展了同一步骤的Haushalt的值，而不是： Step-ID 0 to Haushalt with Step-ID 1 under Step-ID 0 and Step-ID 1 to带有Step-ID 2 Haushalt 。

Answer 1

Ok, thanks for all help: I got it with a for-Loop:好的，感谢所有帮助：我用 for-Loop 得到了它：

n <- 0
df<-mutate(df,Structure=NA)
for (i in 1:nrow(df)) {
  if (df[i, 1] == "0") {
    first_step <- df[i, 2]
    values<-first_step
    df$Structure[which(is.na(df$Structure))[1]]<-values
  }
  else if (df[i, 1] == "1") {
    second_step <- df[i, 2]
    values <- paste(first_step, second_step)
    df$Structure[which(is.na(df$Structure))[1]]<-values
  }
  else if (df[i, 1] == "2") {
    third_step <- df[i, 2]
    values <- paste(first_step, second_step, third_step)
    df$Structure[which(is.na(df$Structure))[1]]<-values
  }
  else if (df[i, 1] == "3") {
    fourth_step <- df[i, 2]
    values <- paste(first_step, second_step, third_step, fourth_step)
    df$Structure[which(is.na(df$Structure))[1]]<-values
  }
  n <- n + 1
}

Answer 2

Consider XSLT , the special-purpose language designed to transform XML files, in order to prefix the parent @name attribute to underlying child @name attributes.考虑XSLT ，一种设计用于转换 XML 文件的专用语言，以便将父@name属性作为基础子@name属性的前缀。 With R's xslt (complementary package to xml2 ) you can run XSLT 1.0 scripts:使用 R 的xslt （对xml2的补充包），您可以运行 XSLT 1.0 脚本：

XSLT (save as.xsl file, a special.xml file) XSLT （另存为.xsl文件，一种特殊的.xml文件）

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/Budget">
     <xsl:copy>
       <xsl:apply-templates select="Account"/>
     </xsl:copy>
    </xsl:template>
    
    <!-- MOVE ATTRIBUTES TO ELEMENTS -->
    <xsl:template match="Account">
     <xsl:copy>
       <stepid><xsl:value-of select="@step"/></stepid>
       <name><xsl:value-of select="@name"/></name>
       <value><xsl:value-of select="@value"/></value>
     </xsl:copy>
     <xsl:apply-templates select="*"/>
    </xsl:template>
    
    <!-- MOVE ATTRIBUTES TO ELEMENTS AND CONCATENATE PARENT @name ATTRIBUTE -->
    <xsl:template match="Account/*">
     <xsl:variable name="step">
       <xsl:value-of select="../@name"/>
     </xsl:variable>
     <xsl:copy>
       <stepid><xsl:value-of select="@step"/></stepid>
       <name><xsl:value-of select="concat($step, @name)"/></name>
       <value><xsl:value-of select="@value"/></value>
     </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Online Demo 在线演示

R R

library(xml2)
library(xslt)

# LOAD XML AND XSLT
doc <- read_xml("inputF.xml")
style <- read_xml("style.xsl", package = "xslt")

# RUN TRANSFORMATION AND SEE OUTPUT
new_xml <- xml_xslt(doc, style)

# RETRIEVE ALL NODES
recs <- xml2::xml_find_all(new_xml, "//Account")

# BIND EACH CHILD TEXT AND NAME
df_list <- lapply(recs, function(r) {
  vals <- xml2::xml_children(r)
  
  df <- setNames(
    c(xml2::xml_text(vals)), 
    c(xml2::xml_name(vals))
  ) |> rbind() |> data.frame()
})

# COMBINE ALL DFS
accounts_df <- do.call(rbind.data.frame, df_list)

Output输出

accounts_df

#    stepid   name value
# 1       0      a   123
# 2       1     a1    12
# 3       2   a1.1    12
# 4       1     a2    12
# 5       2   a2.1     9
# 6       2   a2.2     3
# 7       1     a3    99
# 8       2   a3.1    78
# 9       3 a3.1.1    70
# 10      3 a3.1.2     8
# 11      2   a3.2    21
# 12      0      b   234
# 13      1     b1   200

通过在 R 中添加其他单元格和行的值来扩展 Dataframe 中的值

问题描述

2 个解决方案

解决方案1
0 2022-12-23 13:32:23

解决方案2
0 2022-12-24 17:10:25

通过在 R 中添加其他单元格和行的值来扩展 Dataframe 中的值

问题描述

2 个解决方案

解决方案1 0 2022-12-23 13:32:23

解决方案2 0 2022-12-24 17:10:25

解决方案1
0 2022-12-23 13:32:23

解决方案2
0 2022-12-24 17:10:25