简体   繁体   中英

Code new variable based on grep return in R

I have a variable actor which is a string and contains values like "military forces of guinea-bissau (1989-1992)" and a large range of other different values that are fairly complex. I have been using grep() to find character patterns that match different types of actors. For example I would like to code a new variable actor_type as 1 when actor contains "military forces of" , doesn't contain "mutiny of" , and the string variable country is also contained in the variable actor .

I am at a loss as to how to conditionally create this new variable without resorting to some type of horrible for loop. Help me!

Data looks roughly like this:

|   | actor                                              | country         |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau"                 | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau"       | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)"         | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |

if your data is in a data.frame df:

> ifelse(!grepl('mutiny of' , df$actor) & grepl('military forces of',df$actor) & apply(df,1,function(x) grepl(x[2],x[1])),1,0)
[1] 1 0 0 0

grepl returns a logical vector and this can be assigned to whatever, eg df$actor_type .

breaking that appart:

!grepl('mutiny of', df$actor) and grepl('military forces of', df$actor) satisfy your first two requirements. the last piece, apply(df,1,function(x) grepl(x[2],x[1])) goes row by row and greps for country in actor.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM