[英]Extend Value in Dataframe by add Value of an other Cell and Row in R
I try to analyze XML-Data in R with dplyr and ggplot2.我尝试使用 dplyr 和 ggplot2 在 R 中分析 XML 数据。 My code is able to transform the XML data into a data frame.
我的代码能够将 XML 数据转换为数据框。 Unfortunately the structure gets lost.
不幸的是,结构丢失了。
My XML-document have following structure by example:我的 XML 文档具有以下示例结构:
<?xml version="1.0" encoding="UTF-8"?>
<Budget price="1234" items="1234" year="1990">
<Account name="a" value="123" step="0">
<Account name="1" value="12" step="1"/>
<Account name="1.1" value="12" step="2"/>
<Account name="2" value="12" step="1"/>
<Account name="2.1" value="9" step="2"/>
<Account name="2.2" value="3" step="2"/>
<Account name="3" value="99" step="1"/>
<Account name="3.1" value="78" step="2"/>
<Account name="3.1.1" value="70" step="3"/>
<Account name="3.1.2" value="8" step="3"/>
<Account name="3.2" value="21" step)="2"/>
</Account>
<Account name="b" value="234" step="0">
<Account name="1" value="200" step="1"/>
and so on等等
At first I save all values:起初我保存所有值:
budget_values = xml_find_all(doc,"//Budget",flatten=FALSE)
Afterwards I select some of the values:然后我选择一些值:
step_ids = purrr::map_chr(budget_values, ~xml_attr(.,"step"))
name_values = purrr::map_chr(budget_values, ~xml_attr(.,"name"))
values = purrr::map_chr(budget_values, ~xml_attr(.,"value"))
Save attributes in a combined list:将属性保存在组合列表中:
values_list <- list((step_ids),(name_values),(values))
And convert it into a data frame:并将其转换为数据框:
budget_df <- data.frame(sapply(values_list, c))
That works great.效果很好。 I got an DF like this:
我有一个这样的 DF:
Step-ID![]() |
name![]() |
vlaue![]() |
---|---|---|
0 ![]() |
a![]() |
1234 ![]() |
1 ![]() |
1 ![]() |
12 ![]() |
2 ![]() |
1.1 ![]() |
12 ![]() |
1 ![]() |
2 ![]() |
12 ![]() |
2 ![]() |
2.1 ![]() |
9 ![]() |
2 ![]() |
2.2 ![]() |
3 ![]() |
1 ![]() |
3 ![]() |
99 ![]() |
0 ![]() |
b ![]() |
234 ![]() |
1 ![]() |
1 ![]() |
200 ![]() |
and so on等等
As you see from the example some names are repeated - usually step 1 and 2;正如您从示例中看到的,一些名称是重复的——通常是第 1 步和第 2 步; step 3 is usually very unique.
第 3 步通常非常独特。
My aim is following dataframe to analyze the data more structured.我的目标是跟随数据框来分析更有条理的数据。
Step-ID![]() |
name![]() |
vlaue![]() |
---|---|---|
0 ![]() |
a![]() |
1234 ![]() |
1 ![]() |
a1 ![]() |
12 ![]() |
2 ![]() |
a1.1 ![]() |
12 ![]() |
1 ![]() |
a2 ![]() |
12 ![]() |
2 ![]() |
a2.1 ![]() |
9 ![]() |
2 ![]() |
a2.2 ![]() |
3 ![]() |
1 ![]() |
a3 ![]() |
99 ![]() |
0 ![]() |
b ![]() |
234 ![]() |
1 ![]() |
b1 ![]() |
200 ![]() |
and so on等等
For example: I want the values of all step1.例如:我想要所有 step1 的值。 Now I can't tell from which budget it is.
现在我不知道它来自哪个预算。 With the new name I can see: this value is from budget a, this one from budget b and so on.
通过新名称,我可以看到:这个值来自预算 a,这个来自预算 b,依此类推。
I tried following for-loop and stored the result in a new dataframe我尝试使用 for-loop 并将结果存储在一个新的数据框中
df<-for (rows in budget_df) {
if (rows$`Step-ID` == "0") {
saved_name <- rows$name
print(saved_name)
}
else
(rows$`Step-ID` == "1"){
rows$Haushalt+saved_name
saved_names<-saved_name+rows$name
print(saved_names)
}
else(rows$`Step-ID`=="2"){
rows$Haushalt+saved_name
}
else(rows$`Step-ID`=="3"){
rows$name+saved_names
}
}
View(df)
And I get following Error:我得到以下错误:
Error: unexpected '{' in:
" else
(rows$`Step-ID` == "1"){"
My questions is: Is there a better way to analyze the data or rename the values in name?我的问题是:是否有更好的方法来分析数据或重命名名称中的值?
Thank you very much for your help!非常感谢您的帮助!
Update:更新:
Thanks again to @jpsmith.再次感谢@jpsmith。 I tried following code regarding to his recommondation:
我尝试了以下关于他的推荐的代码:
df-budget_df
budget <- ""
df <- for (row in df) {
mutate(
case_when (
df$`Step-ID` == "0" ~ budget <- df$Haushalt,
df$`Step-ID` == "2" ~ mutate(df, sturucture = paste(budget, df$Haushalt)),
df$`Step-ID` == "2" ~ budget <- c(budget, df$Haushalt),
df$`Step-ID` == "3" ~ mutate(df, sturucture = paste(values, df$Haushalt))
)
)
}
Explains logically, what I want to do, but doesn't work.从逻辑上解释我想做什么,但不起作用。 I think, it's because of trying to store the value with
<-
?我认为,这是因为试图用
<-
存储值? I couldn't find another way at ?case_when
to store values.我在
?case_when
找不到另一种方法来存储值。 Another code (I have overwritten) stores the value of Step-ID
and extended the value of Haushalt
of the same step, instead of: Step-ID 0
to Haushalt
with Step-ID 1
under Step-ID 0
and Step-ID 1
to Haushalt
with Step-ID 2
.另一个代码(我已经覆盖)存储了
Step-ID
的值并扩展了同一步骤的Haushalt
的值,而不是: Step-ID 0
to Haushalt
with Step-ID 1
under Step-ID 0
and Step-ID 1
to带有Step-ID 2
Haushalt
。
Ok, thanks for all help: I got it with a for-Loop:好的,感谢所有帮助:我用 for-Loop 得到了它:
n <- 0
df<-mutate(df,Structure=NA)
for (i in 1:nrow(df)) {
if (df[i, 1] == "0") {
first_step <- df[i, 2]
values<-first_step
df$Structure[which(is.na(df$Structure))[1]]<-values
}
else if (df[i, 1] == "1") {
second_step <- df[i, 2]
values <- paste(first_step, second_step)
df$Structure[which(is.na(df$Structure))[1]]<-values
}
else if (df[i, 1] == "2") {
third_step <- df[i, 2]
values <- paste(first_step, second_step, third_step)
df$Structure[which(is.na(df$Structure))[1]]<-values
}
else if (df[i, 1] == "3") {
fourth_step <- df[i, 2]
values <- paste(first_step, second_step, third_step, fourth_step)
df$Structure[which(is.na(df$Structure))[1]]<-values
}
n <- n + 1
}
Consider XSLT , the special-purpose language designed to transform XML files, in order to prefix the parent @name
attribute to underlying child @name
attributes.考虑XSLT ,一种设计用于转换 XML 文件的专用语言,以便将父
@name
属性作为基础子@name
属性的前缀。 With R's xslt
(complementary package to xml2
) you can run XSLT 1.0 scripts:使用 R 的
xslt
(对xml2
的补充包),您可以运行 XSLT 1.0 脚本:
XSLT (save as.xsl file, a special.xml file) XSLT (另存为.xsl文件,一种特殊的.xml文件)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/Budget">
<xsl:copy>
<xsl:apply-templates select="Account"/>
</xsl:copy>
</xsl:template>
<!-- MOVE ATTRIBUTES TO ELEMENTS -->
<xsl:template match="Account">
<xsl:copy>
<stepid><xsl:value-of select="@step"/></stepid>
<name><xsl:value-of select="@name"/></name>
<value><xsl:value-of select="@value"/></value>
</xsl:copy>
<xsl:apply-templates select="*"/>
</xsl:template>
<!-- MOVE ATTRIBUTES TO ELEMENTS AND CONCATENATE PARENT @name ATTRIBUTE -->
<xsl:template match="Account/*">
<xsl:variable name="step">
<xsl:value-of select="../@name"/>
</xsl:variable>
<xsl:copy>
<stepid><xsl:value-of select="@step"/></stepid>
<name><xsl:value-of select="concat($step, @name)"/></name>
<value><xsl:value-of select="@value"/></value>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
R R
library(xml2)
library(xslt)
# LOAD XML AND XSLT
doc <- read_xml("inputF.xml")
style <- read_xml("style.xsl", package = "xslt")
# RUN TRANSFORMATION AND SEE OUTPUT
new_xml <- xml_xslt(doc, style)
# RETRIEVE ALL NODES
recs <- xml2::xml_find_all(new_xml, "//Account")
# BIND EACH CHILD TEXT AND NAME
df_list <- lapply(recs, function(r) {
vals <- xml2::xml_children(r)
df <- setNames(
c(xml2::xml_text(vals)),
c(xml2::xml_name(vals))
) |> rbind() |> data.frame()
})
# COMBINE ALL DFS
accounts_df <- do.call(rbind.data.frame, df_list)
Output输出
accounts_df
# stepid name value
# 1 0 a 123
# 2 1 a1 12
# 3 2 a1.1 12
# 4 1 a2 12
# 5 2 a2.1 9
# 6 2 a2.2 3
# 7 1 a3 99
# 8 2 a3.1 78
# 9 3 a3.1.1 70
# 10 3 a3.1.2 8
# 11 2 a3.2 21
# 12 0 b 234
# 13 1 b1 200
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.