简体   繁体   English

允许用户选择要分析的数据文件范围?

[英]Allow the user to select range of data file to be analyzed?

I have the following XML File: 我有以下XML文件:

<Company >
    <shareprice>
        <timeStamp> 12:00:00.01</timeStamp>
        <Price>  25.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:00.02</timeStamp>
        <Price>  15</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.025</timeStamp>
        <Price>  15.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.031</timeStamp>
        <Price>  18.25</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.039</timeStamp>
        <Price>  18.54</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.050</timeStamp>
        <Price> 16.52</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:02.01</timeStamp>
        <Price>  17.50</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:03.01</timeStamp>
        <Price>  25.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:05.02</timeStamp>
        <Price>  30</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:11.025</timeStamp>
        <Price>  32.25</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:12.031</timeStamp>
        <Price>  26.05</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:15.039</timeStamp>
        <Price>  18.54</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:19.050</timeStamp>
        <Price> 16.52</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:01:02.01</timeStamp>
        <Price>  17.50</Price>
    </shareprice>
</Company>

And I also have the following R Code: 而且我还有以下R代码:

library (ggplot2)
library (XML)
df <- xmlToDataFrame(file.choose()) 
df$timeStamp <- strptime(as.character(df$timeStamp), "%H:%M:%OS")
df$Price <- as.numeric(as.character(df$Price))
sapply(df, class)          
options("digits.secs"=3)   
summary (df)              
df$timeStamp <- df[1,"timeStamp"] + cumsum(runif(1:length(df$timeStamp))*60)
summary(df)
diff1 = 0
diff <- append(diff1,diff(df$Price))
summary (df$Price)
Ymin <- min(df$Price)
Ymax <- max(df$Price)
Ymedian <- median (df$Price)
Ymean <- mean(df$Price)
Ysd <- sd (df$Price)
sink (file="c:/xampp/htdocs/Sharedata.xml", type="output",split=FALSE)
cat("<graph caption=\"Share Data Wave\" subcaption=\"For Person's Name\"   xAxisName=\"Time\" yAxisMinValue=\"-0.025\" yAxisName=\"Voltage\" decimalPrecision=\"5\"  formatNumberScale=\"0\" numberPrefix=\"\" showNames=\"1\" showValues=\"0\" showAlternateHGridColor=\"1\" AlternateHGridColor=\"ff5904\" divLineColor=\"ff5904\" divLineAlpha=\"20\" alternateHGridAlpha=\"5\">\n")
cat(sprintf("    <set name=\"%s\" value=\"%f\" hoverText = \"The difference from last value: %s\" ></set>\n", df$timeStamp, df$Price, diff))
cat ("</graph>\n")
unlink("data.xml")
sink (file="c:/xampp/htdocs/Sharesstatistics.xml", type="output",split=FALSE)
cat ("  <statistics>\n")
cat (sprintf("    <mean>%s</mean>\n", Ymean))
cat (sprintf("    <sd>%s</sd>\n",Ysd))
cat (sprintf("    <min>%s</min>\n", Ymin))
cat (sprintf("    <median>%s</median>\n",Ymedian))
cat (sprintf("    <max>%s</max>\n", Ymax))
cat ("  </statistics>\n")
unlink("statistics.xml")
quit()

The R code does all I want and need it do on the full file. R代码可以完成我想要的所有工作,并且需要在完整文件上完成它。 My question relates to how to let the user to select a range of the input file to analyse instead of the full file, how would this be done? 我的问题与如何让用户选择要分析的输入文件范围而不是整个文件有关,该如何做? For example if the user just wants the 2nd to 5th enteries of the input xml file and keep the same output as defined by the cat statements. 例如,如果用户只希望输入xml文件的第2至第5个小肠,并保留与cat语句定义的相同的输出。

<shareprice>
        <timeStamp> 12:00:00.02</timeStamp>
        <Price>  15</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.025</timeStamp>
        <Price>  15.02</Price>
    </shareprice>

    <shareprice>
        <timeStamp> 12:00:01.031</timeStamp>
        <Price>  18.25</Price>
    </shareprice>

All help greatly appreciated. 所有帮助,不胜感激。

Regards, 问候,

Anthony. 安东尼。

This question can be easily solved by just reading the data frame and then asking the user to give lower en upper limit of records using eg scan(n=2). 只需读取数据帧,然后要求用户使用例如scan(n = 2)给出记录的较低上限,就可以轻松解决此问题。 See also ?scan. 另请参见?scan。 It allows you to give input interactively, so the user can choose what to do. 它允许您以交互方式进行输入,因此用户可以选择要执行的操作。 This is a case for entering a range of data to be used. 这是输入要使用的数据范围的情况。

x <- scan(n=2)
id <- min(x):max(x)

df2 <- df[id,]

If you want to read in only the required fields from a very big XML table, that's another story. 如果您只想从一个很大的XML表中读取必填字段,那就是另一回事了。 I couldn't think of a built-in function to do that, so you would have to do something along the lines of : 我想不出一个内置函数来做到这一点,因此您将不得不按照以下方式进行操作:

# function reads a subset of an xml file,
# assuming a white line is dividing the individual records.
# n is a vector containing the record numbers wanted

subset.xml <- function(x,n,...){
    # set a range if n is just a number
    if (length(n)==1) n <- 1:n

    #initiate vars
    skp <- 0 # the number of lines to skip by scan
    count <- 1
    out <- character(1)

  repeat{
      tmp <- scan(x,what=character(0),n=1,skip=skp,blank.lines.skip=F,sep="\n")
      skp <- skp+1
      if(length(tmp)==0) {break} # no more input

      if((count %in% n) & (tmp !="")) out <- paste(out,tmp,sep="\n")
      if(tmp=="") count <- count+1 # white line seperates records
  }
  out <- substring(out,3)
  out <- paste("<Data>",out,"</Data>",sep="\n")
  return(xmlToDataFrame(xmlParse(out)))
}

df <- subset.xml("test.xml",2:4)
> df
      timeStamp   Price
1   12:00:00.02      15
2  12:00:01.025   15.02
3  12:00:01.031   18.25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM