简体   繁体   中英

R scripts in Microsoft SQL Server Management Studio

My problem is that I can not understand the error message of this environment. I think it is very vague. Now I do not understand where the problem is.

EXEC sp_execute_external_script
  @language = N'R',
  @script = N'
    count = 0; x=1; y=2; m="that is good until here"
    data = as.vector(data);
    for(i in data){
        if(data[y]>data[x]){count=count+1; x=x+1; y=y+1}
        else{x=x+1; y=y+1}};
    count <- data.frame(count)',
    @output_data_1_name = N'count',
    @input_data_1_name = N'data',
    @input_data_1 = N'SELECT alcohol FROM [wine].[dbo].[wineT]'

在此处输入图像描述

在此处输入图像描述

Untested, try this:

EXEC sp_execute_external_script
  @language = N'R',
  @script = N'
    data = unlist(data);
    count = data.frame(count = sum(data[-length(data)] > data[-1]);',
  @output_data_1_name = N'count',
  @input_data_1_name = N'data',
  @input_data_1 = N'SELECT alcohol FROM [wine].[dbo].[wineT]'

Issues:

  1. as.vector does not do much to a data.frame , ergo the shift to unlist(data) ;

  2. Your missing value error is because you extend y out beyond the length of data . For instance, on the R console I can reproduce the error with this:

     for (i in data) { if (data[y] > data[x]) { count=count+1; x=x+1; y=y+1} else {x=x+1; y=y+1} } # Error in if (data[y] > data[x]) { (from #1): missing value where TRUE/FALSE needed count # [1] 4 x # [1] 10 y # [1] 11

    Since length(data) is 10, then data[y] is data[11] is NA . This leads to a conditional of NA > 3 which returns NA which does not work in an if conditional. (FYI, an if conditional must always be length-1, and it must be clearly "truthy", meaning TRUE or FALSE , or a number where 0 is false and anything else is true.)

  3. An alternative to this creates i as an index on data starting at 2 .

     count <- 0 for(i in seq_along(data)[-1]) { if (data[i-1] > data[i]) { count=count+1 }; x=x+1; y=y+1; } count # [1] 4

    where seq_along(data) produces (in this example) 1:10 , but [-1] removes the first 1 , so we can index safely from 2 until the length of data .

  4. Better yet, though, is that we don't need to loop at all: all you want to do is compare each value (except the first) with the preceding value and count how many times the previous number is greater. R vectorizes very well, so we can determine in one expression which meet that condition, and sum them up just as quickly.

     data # a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 # 1 5 10 8 2 4 6 9 7 3 data[-length(data)] > data[-1] # a1 a2 a3 a4 a5 a6 a7 a8 a9 # FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE

    and sum(..) that up to get our needed result.

I know it is not a tidy and efficient answer but I get the right answer with this code.

  EXEC sp_execute_external_script
      @language = N"R",
      @script = N"
        count=0; x=1; y=2; z=NA;
        data = unlist(data);
        for(i in data){
            if(is.na(z)){z=FALSE}else{
            if(data[y]>data[x]){count=count+1; x=x+1; y=y+1}
            else{x=x+1; y=y+1}}};
        count <- data.frame(count)",
        @output_data_1_name = N"count",
        @input_data_1_name = N"data",
        @input_data_1 = N"SELECT column1 FROM [wine].[dbo].[data]"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM