简体   繁体   English

为两个列表中的每个唯一值创建变量

[英]Create variable for each unique value over two lists

Apologies in advance for the lengthy post. 对于冗长的帖子,我们事先表示歉意。 I am nominally familiar with Python, but think it might be able to easily accomplish the task. 我名义上对Python很熟悉,但认为它可能能够轻松完成任务。 Some background: I have survey data where respondents were asked to select the two schools they're considering applying to out of a list of 1500 or so. 一些背景:我有一项调查数据,要求受访者从1500所左右的列表中选择他们正在考虑申请的两所学校。 The data are stored as two variables (one per institution selected – vname “Institution_1”, “Institution_2”) where each value uniquely identifies a particular institution. 数据存储为两个变量(每个机构一个变量– vname“ Institution_1”,“ Institution_2”),其中每个值唯一地标识一个特定机构。

Later on respondent rate the institutions they selected on a 1 to 6 scale on a series of attributes. 后来根据受访者的评分,他们根据一系列属性以1到6的比例选择了机构。 Each of these ratings is stored as a separate scale variable in the data, and I have two of them – corresponding to what position the institution was selected in. If, for example, Adelphi University is “Institution_1” then the ratings on “Core academics” is stored in variable “Q.32_combined_1”; 这些评分中的每一个都作为单独的比例变量存储在数据中,我有两个-对应于所选机构的位置。例如,如果阿德菲大学是“ Institution_1”,则“核心学者”的评分”存储在变量“ Q.32_combined_1”中; if Adelphi University is “Institution_2” then the ratings on “Core academics” is stored in variable “Q.36_combined_1”. 如果阿德菲大学是“ Institution_2”,那么“核心学者”的评分将存储在变量“ Q.36_combined_1”中。

I want to combine the ratings for each institution and here's the SPSS syntax for doing so for this one institution (Adelphi is uniquely identified with a meaningful value of 188429): 我想结合每个机构的评级,这是针对这家机构的SPSS语法(Adelphi被唯一标识为有意义的值188429):

DO IF (Institution_1 = 188429).
COMPUTE Adelphi_CoreAcad=Q.32_combined_1.
ELSE IF (Institution_2 = 188429).
COMPUTE Adelphi_CoreAcad =Q.36_combined_1.
END IF.
EXECUTE.

But we have 1,000+ institutions in our data. 但是我们的数据中有1,000多家机构。 How can we create a variable for each unique value over these two lists (Institution_1 and Institution_2). 我们如何为这两个列表(Institution_1和Institution_2)上的每个唯一值创建一个变量。 Is there a way to use Python to create these variables and/or build the SPSS syntax that would work? 有没有办法使用Python创建这些变量和/或构建可行的SPSS语法?

Thanks! 谢谢!

Try this. 尝试这个。 It's rough, since I don't have SPSS, but I think it's what you're asking for. 粗略,因为我没有SPSS,但我认为这是您所要的。 (Note: I'm not sure that what you're asking for is the right thing , but see if it works, and maybe we'll go from there.) (注意:我不确定您要的是正确的东西 ,但是请查看它是否可行,也许我们会从那里继续。)

This creates a set of variables named U188429_CoreAcad, etc. Where the U is just a leading prefix ("U" for "Unit ID"), 188429 is the unit id, and "CoreAcad" is a made up string you can change. 这将创建一组名为U188429_CoreAcad的变量,等等。其中U只是一个前导前缀(“ Unit ID”的“ U”),188429是单元ID,“ CoreAcad”是可以更改的组合字符串。

I used categories 'CoreAcad', 'PrettyCoeds', 'FootballTeam' and 'Drinking', because if I had it all to do over again, that's how I would have rated schools. 我使用了“ CoreAcad”,“ PrettyCoeds”,“ FootballTeam”和“ Drinking”类别,因为如果我再做一遍,那将是对学校的评价。 (Except for 'CoreAcad,' which was your thing.) (除了“ CoreAcad”,这是您的事。)

I assumed that your categories were 32-35 for institution 1, and 36-39 for institution 2. You can change those below as well. 我假设机构1的类别为32-35,机构2的类别为36-39。您也可以在下面进行更改。

I assumed that you can spss.Submit a bunch of lines together. 我以为你可以spss.Submit一堆线在一起。 If not, split the string up and submit the lines one at a time. 如果不是,请分割字符串,然后一次提交一行。

I commented out "BEGIN PROGRAM", "import spss", "END PROGRAM" because I'm just feeding stuff into a command-line python2.7. 我注释掉了“ BEGIN程序”,“ import spss”,“ END程序”,因为我只是将内容输入命令行python2.7中。 Uncomment those for your use. 取消注释供您使用。

#BEGIN PROGRAM.
#import spss, spssaux

# According to the internet, unitids are sparse values.
Unit_ids = [
        188429, # Adelphi
        188430, # Random #s
        171204,
        100001,
]

Categories = {
    'CoreAcad' : ('Q.32_combined_1', 'Q.36_combined_1'),
    'PrettyCoeds' : ('Q.33_combined_1', 'Q.37_combined_1'),
    'FootballTeam' : ('Q.34_combined_1', 'Q.38_combined_1'),
    'Drinking' : ('Q.35_combined_1', 'Q.39_combined_1'),
}


code = """
DO IF (Institution_1 = %(unitid)d).
COMPUTE U%(unitid)d_%(category)s = %(answer1)s.
ELSE IF (Institution_2 = %(unitid)d).
COMPUTE U%(unitid)d_%(category)s = %(answer2)s.
END IF.
EXECUTE.
"""
for unitid in Unit_ids:
    for category, answers in Categories.iteritems():
        answer1,answer2 = answers
        print(code%(locals()))
        #spss.Submit(code%(locals()))


#END PROGRAM.

I suggest a different restructure solution: 我建议使用其他重组解决方案:
First, you separate the two institutions into two lines, each with it's corresponding ratings: 首先,将两个机构分为两行,每行分别具有相应的等级:

varstocases /make institution from Institution_1 Institution_2 
  /make CoreAcad from Q.32_combined_1 Q.36_combined_1
  /make otherRting from inst1var inst2var.

You can add another make subcommand for each additional rating that corresponds to each of the two institutions. 您可以为与这两个机构中的每一个相对应的每个附加等级添加另一个make子命令。
At this point your data has one line per single institution and it's ratings. 在这一点上你的数据为每一个机构一个行,它的收视率。 You can now analyze them, eg: 您现在可以分析它们,例如:

means CoreAcad otherRting by institution.

Or you can aggregate by institution to analyze their ratings. 或者,您可以按机构进行汇总以分析其评级。 For example: 例如:

DATASET DECLARE AggByInst.
AGGREGATE  /OUTFILE='AggByInst' /BREAK=institution 
    /MCoreAcad MotherRting =MEAN(CoreAcad otherRting).

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM