I have dataset that looks like this:
Var1
PASSED=50; NOT PASSED=10; GPA=1;
How can I produce the dataset below?
Pass Not_pass GPA
50 10 1
I used the following code but it did not work:
generate pass = subinstr(subinstr(word(Var1, 1), "PASSED=", "", .) if regexm(Var1, "PASSED=") == 1
replace pass = pass[_n+1] if pass[_n]=="" & pass[_n+1]!=""
The following works for me:
clear
input strL Var1
"PASSED=50; NOT PASSED=10; GPA=1;"
end
split Var1, parse(";") generate(x)
forvalues i = 1 / 3 {
generate v`i' = real(regexs(1)) if regexm(x`i',"([0-9]+)")
}
drop x*
rename (v1 v2 v3) (Pass Not_Pass GPA)
list
+----------------------------------------------------------+
| Var1 Pass Not_Pass GPA |
|----------------------------------------------------------|
1. | PASSED=50; NOT PASSED=10; GPA=1; 50 10 1 |
+----------------------------------------------------------+
you can learn to split up strings in python using the str documentation. For example
var1 = "PASSED=50; NOT PASSED=10; GPA=1;"
p, np, gpa, _ = var1.split(";")
This can actually leave some white space
print(np)
>>> ' NOT PASSED=10'
Which can be fixed with strip
print(np.strip())
>>> 'NOT PASSED=10'
Then you can set up a dictionary to store all of your data
d = {x.strip().split("=")[0]:x.split("=")[1] for x in [p, np, gpa]}
print(d)
>>> {'PASSED': '50', 'NOT PASSED': '10', 'GPA': '1'}
Using moss
from SSC is another way to do it in Stata.
clear
input strL Var1
"PASSED=50; NOT PASSED=10; GPA=1;"
end
ssc install moss
moss Var1, match("([0-9]+)") regex
rename (_match?) (Pass Not_Pass GPA)
drop _*
list
+----------------------------------------------------------+
| Var1 Pass Not_Pass GPA |
|----------------------------------------------------------|
1. | PASSED=50; NOT PASSED=10; GPA=1; 50 10 1 |
+----------------------------------------------------------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.