简体   繁体   English

SAS根据现有数据集为新数据集计算新变量

[英]SAS calculating new variable for a new dataset from an existing Dataset

This is a portion of given dataset, projet.details_etest : 这是给定数据集projet.details_etest

 "survey_instance_id"  "user_id"   "question_id"    "Item_correct"
'"2008"                "14389"        "4243"           "0"
'"2008"                "14489"        "4243"           "1"
'"2008"                "14499"        "4253"           "0"
'"2008"                "1669"         "4253"           "1"

I want to create a new dataset called projet.resume_question which contains the dataset details sort by question_id , containing the variables: 我想创建一个名为projet.resume_question的新数据集,其中包含按question_id排序的数据集详细信息,其中包含变量:

  • survey_instance_id
  • question_id
  • nb_correct_answers
  • nb_incorrect answers
  • nb_omitted_answers
  • nb_total_with_omitted_answers
  • nb_total_without_omitted_answers

The variable nb_omitted_answers is the total number of participants minus nb_correct_answers , the number of correct answers per question, minus nb_incorrect_answers , the number of incorrect answers per question. 变量nb_omitted_answers是参与者总数减去nb_correct_answers ,每个问题的正确答案的数量,减去nb_incorrect_answers ,每个问题的错误答案的数量。

The variable nb_total_with_omitted_answers is the total number of participants who have participated in the test. 变量nb_total_with_omitted_answers是已参加测试的参与者总数。

The variable nb_total_without_omitted_answers is the total number of participants who have answered each question. 变量nb_total_without_omitted_answers是回答每个问题的参与者总数。

Here is what I did: 这是我所做的:

   data projet.resume_question;
set projet.details_etest;
by question_id;
keep survey_instance_id question_id nb_correct_answers nb_incorrect_answers;
retain nb_correct_answers 0 nb_incorrect_answers 0;
if Item_correct =1 then correct_answers= Item_correct;
else if Item_correct =0 then incorrect_answers= Item_correct;
nb_correct_answers = sum (correct_answers);
nb_incorrect_answers= sum (incorrect_answers);
run;
proc print data=projet.resume_question;
run;

I start this way and what I found seems wrong to me when I printed it. 我以这种方式开始,打印时发现的东西对我来说似乎是错误的。 Can someone help me please? 有人能帮助我吗?

First sort the dataset by survey, question, participant. 首先按调查,问题,参与者对数据集进行排序。

proc sort data = projet.details_etest out = details;
    by survey_instance_id question_id user_id;
run;

Now get the number of participants for each survey. 现在获取每个调查的参与者人数。

proc sql;
    create table participated as
    select survey_instance_id,
        count(distinct user_id) as nb_total_with_omitted_answers
    from details
    group by survey_instance_id;
quit;

Compute the aggregates by survey, question. 通过调查,问题计算合计。

data aggregated;
    set details;
    by survey_instance_id question_id;

    retain nb_total_without_omitted_answers
           nb_correct_answers nb_incorrect_answers 0;

    if first.question_id then do;
        nb_total_without_omitted_answers = 0;
        nb_correct_answers = 0;
        nb_incorrect_answers = 0;
    end;

    if item_correct in (0, 1) then nb_total_without_omitted_answers + 1;

    if item_correct = 1 then nb_correct_answers + 1;
    else if item_correct = 0 then nb_incorrect_answers + 1;

    if last.question_id then output;

    drop user_id item_correct;
run;

Lastly, compute the number of omitted answers per question. 最后,计算每个问题省略的答案的数量。

data projet.resume_question;
    merge participated aggregated;
    by survey_instance_id;

    nb_omitted_answers = nb_total_with_omitted_answers -
        nb_correct_answers - nb_incorrect_answers;
run;

This should get you what you need. 这应该为您提供所需的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM