简体   繁体   English

在Stata中的循环中删除各种字符串变量

[英]Dropping various string variables in a loop in Stata

I want to a drop a great number of string variables that contain the word "Other" in their observations. 我想drop大量的字符串变量,这些字符串变量的观察结果中包含“ Other”一词。 As such, I tried the following loop to drop all the variables: 因此,我尝试了以下循环来drop所有变量:

foreach var of varlist v1-v240 { 

drop `var' if  `var'=="Other"

}

What I get in return is the answer "syntax error". 我得到的是答案“语法错误”。 I would like to know not only a way to perform the task of dropping all the variables that contain the word "Other", but also why the code that I've entered returns an error. 我不仅想知道删除所有包含“ Other”一词的变量的方法,而且还想知道为什么我输入的代码会返回错误。

The short answer on why your syntax is illegal, which @Dimitriy Masterov doesn't quite spell out, is that drop supports just two syntaxes, which can't be mixed, drop ping variables and drop ping observations. 关于您的语法为何非法的简短答案(@Dimitriy Masterov并没有很清楚地说明)是drop支持仅两种语法,它们不能混合使用,即drop ping变量和drop ping观察值。 This is documented: see eg http://www.stata.com/help.cgi?drop and the corresponding on-line help and manual entry within Stata. 这是有据可查的:请参见例如http://www.stata.com/help.cgi?drop以及Stata中相应的在线帮助和手动输入。

In addition to other solutions, findname from the Stata Journal would allow this solution: 除了其他解决方案之外, Stata Journal的 findname允许该解决方案:

findname, any(@ == "Other") 
drop `r(varlist)' 

Your interpretation of contain is evidently 'is equal to' judging by your use of == as an operator, echoed above. 通过将==用作运算符,您对contain的解释显然是“等于”,上面已作了回应。 If contain really means 'includes as substring', then you need a syntax such as 如果包含确实意味着“包含为子字符串”,那么您需要使用如下语法:

any(strpos(@, "Other"))  

or 要么

any(regexm(@, "Other"))  

as @Dimitriy also explains. 正如@Dimitriy也解释的那样。

If they are actual strings, this should work: 如果它们是实际的字符串,这应该起作用:

sysuse auto, clear

ds, has(type string) // get a list of string variables

// loop over each string variable, count observations that contain Buick anywhere, and drop the variable if N>0
foreach var of varlist `r(varlist)' {
    count if regexm(`var',"Buick") 
    if r(N)>0 {
        drop `var'
    }
}

If "contains" means only contains, then you need to use "^Buick$" instead or 如果“包含”表示包含,则需要使用“ ^ Buick $”代替,或者

count if `var'=="Buick"

Beware of leading/trailing spaces. 当心前导/尾随空格。

The if qualifier restricts the scope of a command to those observations for which the value of the expression is true. if限定符将命令的范围限制为表达式值为真的那些观察值。 Your code errors because you are asking Stata to drop a variable (a column) if some observations (rows) satisfy a condition. 代码错误是因为您要求Stata如果某些观察(行)满足条件,则要删除变量(列)。 You could use the if qualifier to drop those observations or you can drop a variable, but not both simultaneously. 您可以使用if 限定符删除这些观察值,也可以删除变量,但不能同时删除两个变量。 My code uses the if command (a different beast) to verify the condition, and then drops the variable if that condition is satisfied. 我的代码使用if 命令 (另一个野兽)来验证条件,然后在满足条件的情况下删除变量。

You might be tempted to do something like 您可能会想做类似的事情

if `var'=="Other" {
 drop `var'
}

but that will usually not work as expected (it would drop the variable only if the first observation was "Other"). 但这通常无法按预期方式工作(只有在第一个观察值为“其他”时,它才会删除变量)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM