[英]Import Infopath .XML forms into data frame in R
在R中導入Infopath .XML表單並轉換為數據幀的最佳方法是什么? 如果我在Excel中打開Infopath .XML文件,則數據框的行和列會正確顯示。
這是我在R中使用XML包嘗試的內容:
但是,在第2步中,我收到以下錯誤:
Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("touch your head13011000", :
duplicate subscripts for columns
但是,當我在Excel中打開XML文件時,似乎沒有重復的列。 如何將此XML文件從Infopath轉換為R中的數據框? 預期的列應該是(因為它們出現在Excel中):
TCID, DateCoded, tcAge, T1_B3, T1_B2, T1_B1, T1_B0, T1_A3, T1_A2, T1_A1, T1_A0, T1_DelayTotal, T2_A3, T2_A2, T2_A1, T2_A, T2_B3, T2_B2, T2_B1, T2_B0, T2_DelayTotal, Coder, notes_t1, note_t2, bachildpres30, baparpres30, bapassptgo, bapassptnogo, bamissgame, P1_B3, P1_B2, P1_B1, P1_B0, P1_A3, P1_A2, P1_A1, P1_A0, P1_DelayTotal, P1_action, P1_go-nogo, P1_score, P1_delay, P1_trial, P1_Ecommand, P1_imitation, P1_restraint, P1_ruleswitch, P1_trials, P1_gotrials, P1_nogotrials, T1_gotrials, T1_nogotrials, T1_trials, T2_gotrials, T2_nogotrials, T2_trials, P1_notplay, T1_trial, T1_go-nogo, T1_score, T1_delay, T1_action, T2_trial, T2_go-nogo, T2_score, T2_delay, T2_action
對於在XML文件中多次出現的變量,我希望它們以長格式存在於數據框中(即,同一變量的多行)。 我對XML文件沒有太多經驗,所以非常感謝您的幫助。
下面是我使用xmlParse時解析的XML文件在R中的樣子:
<my:myFields lang="en-us" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:st="urn:schemas-microsoft-com:office:smarttags" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2009-07-01T18:12:59" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
<my:SPSS>
<my:TCID>10</my:TCID>
<my:DateCoded>2013-04-01</my:DateCoded>
<my:tcAge>30</my:tcAge>
<my:T1_B3>6</my:T1_B3>
<my:T1_B2>0</my:T1_B2>
<my:T1_B1>0</my:T1_B1>
<my:T1_B0>0</my:T1_B0>
<my:T1_A3>0</my:T1_A3>
<my:T1_A2>0</my:T1_A2>
<my:T1_A1>1</my:T1_A1>
<my:T1_A0>5</my:T1_A0>
<my:T1_DelayTotal>1</my:T1_DelayTotal>
<my:T2_A3 nil="true"/>
<my:T2_A2 nil="true"/>
<my:T2_A1 nil="true"/>
<my:T2_A0 nil="true"/>
<my:T2_B3 nil="true"/>
<my:T2_B2 nil="true"/>
<my:T2_B1 nil="true"/>
<my:T2_B0 nil="true"/>
<my:T2_DelayTotal nil="true"/>
<my:Coder>Name</my:Coder>
</my:SPSS>
<my:notes_t1/>
<my:note_t2/>
<my:bachildpres30>0</my:bachildpres30>
<my:baparpres30>0</my:baparpres30>
<my:bapassptgo>1</my:bapassptgo>
<my:bapassptnogo>0</my:bapassptnogo>
<my:bamissgame>0</my:bamissgame>
<my:P1_B3>4</my:P1_B3>
<my:P1_B2>0</my:P1_B2>
<my:P1_B1>0</my:P1_B1>
<my:P1_B0>1</my:P1_B0>
<my:P1_A3>0</my:P1_A3>
<my:P1_A2>0</my:P1_A2>
<my:P1_A1>1</my:P1_A1>
<my:P1_A0>3</my:P1_A0>
<my:P1_DelayTotal>0</my:P1_DelayTotal>
<my:group2>
<my:group3>
<my:P1_action>touch your head</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>1</my:P1_trial>
<my:P1_Ecommand>1</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your nose</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>2</my:P1_trial>
<my:P1_Ecommand>1</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your tummy</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>3</my:P1_trial>
<my:P1_Ecommand>1</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your head</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>4</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your head</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>5</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your nose</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>6</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>clap your hands</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>7</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your nose</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>8</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your ears</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>9</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your tummy</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>0</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>10</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your eyes</my:P1_action>
<my:P1_go-nogo>0</my:P1_go-nogo>
<my:P1_score>1</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>11</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>1</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
<my:group3>
<my:P1_action>touch your eyes</my:P1_action>
<my:P1_go-nogo>1</my:P1_go-nogo>
<my:P1_score>3</my:P1_score>
<my:P1_delay>0</my:P1_delay>
<my:P1_trial>12</my:P1_trial>
<my:P1_Ecommand>0</my:P1_Ecommand>
<my:P1_imitation>0</my:P1_imitation>
<my:P1_restraint>0</my:P1_restraint>
<my:P1_ruleswitch>0</my:P1_ruleswitch>
</my:group3>
</my:group2>
<my:P1_trials>9</my:P1_trials>
<my:P1_gotrials>5</my:P1_gotrials>
<my:P1_nogotrials>4</my:P1_nogotrials>
<my:T1_gotrials>6</my:T1_gotrials>
<my:T1_nogotrials>6</my:T1_nogotrials>
<my:T1_trials>12</my:T1_trials>
<my:T2_gotrials>0</my:T2_gotrials>
<my:T2_nogotrials>0</my:T2_nogotrials>
<my:T2_trials>0</my:T2_trials>
<my:P1_notplay/>
<my:group4>
<my:group5>
<my:T1_trial>1</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>1</my:T1_delay>
<my:T1_action>Touch your tongue</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>2</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your teeth</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>3</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your ear</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>4</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Clap your hands</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>5</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Clap your hands</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>6</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your eyes</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>7</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your feet</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>8</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your nose</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>9</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>1</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your nose</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>10</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your tummy</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>11</my:T1_trial>
<my:T1_go-nogo>0</my:T1_go-nogo>
<my:T1_score>0</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Wave your hand</my:T1_action>
</my:group5>
<my:group5>
<my:T1_trial>12</my:T1_trial>
<my:T1_go-nogo>1</my:T1_go-nogo>
<my:T1_score>3</my:T1_score>
<my:T1_delay>0</my:T1_delay>
<my:T1_action>Touch your head</my:T1_action>
</my:group5>
</my:group4>
<my:group6>
<my:group7>
<my:T2_trial>1</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your tongue</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>2</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your teeth</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>3</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your ear</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>4</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Clap your hands</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>5</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Clap your hands</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>6</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your eyes</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>7</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your feet</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>8</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your nose</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>9</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your nose</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>10</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your tummy</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>11</my:T2_trial>
<my:T2_go-nogo>1</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Wave your hand</my:T2_action>
</my:group7>
<my:group7>
<my:T2_trial>12</my:T2_trial>
<my:T2_go-nogo>0</my:T2_go-nogo>
<my:T2_score/>
<my:T2_delay>0</my:T2_delay>
<my:T2_action>Touch your head</my:T2_action>
</my:group7>
</my:group6>
</my:myFields>
根據我的經驗,只有當XML已經以非常一致的方式構建時, xmlToDataFrame
才有效。 您正在使用的數據以多種不同方式構建:
# Assuming you've already read your data into a character vector called `xml_file
xml_file <- xmlParse(xml_file)
xml_file <- xmlToList(xml_file)
stack(sapply(xml_file, length))
values ind
1 22 SPSS
2 0 notes_t1
3 0 note_t2
4 1 bachildpres30
5 1 baparpres30
6 1 bapassptgo
7 1 bapassptnogo
8 1 bamissgame
9 1 P1_B3
10 1 P1_B2
11 1 P1_B1
12 1 P1_B0
13 1 P1_A3
14 1 P1_A2
15 1 P1_A1
16 1 P1_A0
17 1 P1_DelayTotal
18 12 group2
19 1 P1_trials
20 1 P1_gotrials
21 1 P1_nogotrials
22 1 T1_gotrials
23 1 T1_nogotrials
24 1 T1_trials
25 1 T2_gotrials
26 1 T2_nogotrials
27 1 T2_trials
28 0 P1_notplay
29 12 group4
30 12 group6
31 1 .attrs
因此,大多數節點都包含單個值。 有些是空的。 “SPSS”節點包含22個值,所有值都具有不同的名稱,“group2”,“group4”和“group6”都包含12個節點,每個節點包含多個值,但其值在節點之間相似。 當我查看Excel在導入文件時所執行的操作時,它將12個節點的組件堆疊在一起,然后將所有22個“SPSS”組件與所有單值節點串起來並重復該字符串通過堆疊12節點組件創建的行數,然后將這兩個塊綁定在一起。
為此,將長字符串與12節點塊分開:
xml_file_singles <- xml_file[sapply(xml_file, length) != 12]
xml_file_singles[sapply(xml_file_singles, length) == 0] <- NA
xml_file_singles <- unlist(xml_file_singles)
xml_file_multiples <- xml_file[sapply(xml_file, length) == 12]
現在取12節點塊並將每個塊轉換為數據幀:
xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) {
x <- lapply(xml_file_multiples[[i]], function(y) {
data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)})
x <- do.call("rbind", x)
cbind("group" = names(xml_file_multiples)[i], x)
})
現在使用plyr
包的rbind.fill
函數將所有新數據幀放在一起:
require(plyr)
xml_file_multiples <- do.call("rbind.fill", xml_file_multiples)
現在cbind
您的長字符串cbind
綁定到綁定的數據幀:
xml_final <- cbind(as.list(xml_file_singles), xml_file_multiples,
stringsAsFactors = FALSE)
這種方法與Excel一樣,引入了大量的NA,因為不同12節點塊的列名略有不同。 如果你在調用rbind.fill
之前這樣rbind.fill
:
xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) {
x <- lapply(xml_file_multiples[[i]], function(y) {
data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)})
x <- do.call("rbind", x)
x <- cbind("group" = names(xml_file_multiples)[i], x)
colnames(x) <- gsub("^\\w\\d_", "", colnames(x))
x
})
您將生成更少的NA,因為您將生成更少的冗余列,但是您必須依賴“組”列中的值來跟蹤哪些行最初出現在哪個節點中。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.