I've never used Stata before and have a very scanty knowledge of it. I've been trying to collapse a dataset of bilateral information according to year
, country1
, country2
, and take the means of all other information. In R, I tried running:
aggregate(dataset,by=list(dataset$year,dataset$country1,dataset$country2),FUN=mean,na.rm=TRUE)
The dataset is too large for my computer's RAM to handle my collapsing in R (another issue I can't solve), and when a colleague attempted to run the code, other data were not shown as means (in some cases, only the data from one row of a particular dyad-year was selected; in others, I'm not even sure what happened). Smaller subsets of the dataset showed correct results.
Because of the issue in R, I want to try doing this in Stata, but whereas I previously attempted using
collapse (mean) <every variable I wanted a ``mean'' of, or otherwise wanted to remove from the dataset>, by(year country1 country2)
Stata did not know how to handle strings. I have so little understanding of Stata, that I can't figure out how to resolve this issue. Could someone please provide me the code I would need to use the collapse
command on a large number of variables, many of which are strings (and, in the case of strings, for which I want NA
returns)?
You can select numeric variables automatically with ds
. ds
is an official command. findname
( Stata Journal ) is a user-written successor to ds
with more functionality (fact) and a friendlier syntax (author's opinion, although the same author was the last author of ds
).
. sysuse auto
(1978 Automobile Data)
. ds, has(type numeric)
price rep78 trunk length displacement foreign
mpg headroom weight turn gear_ratio
. findname, type(numeric)
price rep78 trunk length displacement foreign
mpg headroom weight turn gear_ratio
In both cases, you will find that the names of numeric variables are returned in r(varlist)
:
. di "`r(varlist)'"
price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign
so that you feed that to collapse
. collapse `r(varlist)', by(year country1 country2)
In general, there is no substitute for reading the help and manual entry for collapse
.
If the string variables you are trying to compute a mean for are numbers treated as strings, eg "1", "2", etc., then you can convert the variable to numeric type using real()
or destring
. String variables not in this form, eg "alligator", "lizard", "snake", etc., for which you want no mean, will be dropped if they are not included in the collapse
.
Example:
clear all
set more off
* some example data
input ///
str4 numstr num str11 reptiles
"234" 234 "alligator"
"2135" 2135 "lizard"
"324" 324 "snake"
end
list
* create numeric variable from string
destring(numstr), gen(num2)
* the collapse
collapse (mean) num num2
list
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.