[英]Parsing xml data in BGG xml api with R
這個問題是這個問題的第二部分: How to parse xml lists and tables in R for BGG API 。
我想為這個表生成一個數據框:
<marketplacelistings>
<listing>
<listdate>Thu, 19 Jan 2006 22:08:15 +0000</listdate>
<price currency="EUR">90.00</price>
<condition>likenew</condition>
<notes>Siedler von Catan / Settlers of Catan-Set (Basisspiel/basic game + Erweiterungen Die Seefahrer/ Städte und Ritter/ 5-6 Spieler / extensions The Seafarers/ Cities and Knights/ 5-6 players); 3 x gespielt (Neuwertig; lediglich alle Bestandteile in EINER der Originalboxen verstaut) / 3 times played (like new; only all items in ONE original box stored); Abgabe nur komplett / selling only all together; KEIN Festpreis (nur um überhaupt etwas einzugeben) – erwarte Angebot! / no fixed price (just to complete the entries)– make an offer; Versand weltweit zu Lasten Käufer / shipping worldwide, paid by buyer</notes>
<link href="https://boardgamegeek.com/market/product/40605" title="marketlisting"/>
</listing>
<listing>
<listdate>Mon, 29 Sep 2008 15:25:32 +0000</listdate>
<price currency="USD">34.95</price>
<condition>new</condition>
<notes>Brand New Sealed Board Game. Released from MayFair Games. Price is in USD. If you wish to pay in CAD...then we will convert at market rate. Shipping is $10.95 USD. We also carry the 5-6 Player Expansion that goes with this for $24.95 USD. We have sold thousands of board games across Canada. Please feel free to buy with confidence.</notes>
<link href="https://boardgamegeek.com/market/product/116347" title="marketlisting"/>
</listing>
這是我不知道該怎么做的地方。 這個游戲有大約 100 個列表,我想從中創建一個數據框。 我從哪說起呢? 下面的代碼不起作用,因為它給出了 NULL 結果。
listings_df <- do.call(rbind,lapply(
getNodeSet(xmltop, '//marketplacelistings'),
function(x) data.frame(
XML:::xmlAttrsToDataFrame(xmlChildren(x)),
row.names = NULL
)))
編輯:這是我的最終解決方案。 它可能並不優雅,但它確實有效。
marketplace_df_func <- function(xmltop){
marketplace_df <- data.frame(
listdate = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//listdate"), xmlValue),
currency = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//price[@currency]"), xmlAttrs),
price = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//price"), xmlValue),
condition = xmlSApply(getNodeSet(xmltop, "//marketplacelistings//listing//condition"), xmlValue))
marketplace_df$listdate <- substr(marketplace_df$listdate, 1, 25)
return(marketplace_df)}
由於這個 XML 現在在元素而不是屬性中具有更多數據,因此只需運行可訪問的xmlToDataFrame
而無需lapply
循環:
library(XML)
url <- "..."
doc <- xmlParse(readLines(url))
listings_df <- xmlToDataFrame(doc, nodes = getNodeSet(doc, "//listing"))
要綁定底層屬性,請使用特殊方法:
listings_df <- data.frame(
xmlToDataFrame(doc, nodes = getNodeSet(doc, "//listing")),
XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//listing/price")),
XML:::xmlAttrsToDataFrame(getNodeSet(doc, "//listing/link")),
row.names = NULL
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.