简体   繁体   中英

Check if row contains a substring, print value in another column (R programming)

Goal:

I am looking at example Twitter data and am checking to see if my data in column "Tweet" contains the string of words "yo creo." If the tweet contains "yo creo," I would like to print a "1" in the column "Subject Expression" .

Error:

I am receiving the error: Must subset columns with a valid subscript vector. x Subscript has the wrong type logical . ℹ It must be numeric or character.

Here is my code:

#Read in data
MyData <-read.csv("/Users/mydata/Desktop/MyData.csv")

#Append subject expression column to dataframe
MyData$SubjectExpression <- ""

#Count instances of subject expression using select
MyData%>%
  mutate(SubjectExpression)= 
  case_when(
    select(MyData, Tweet, contains("yo creo") == '1')
  )

You've got a few issues.

  • mutate syntax is data %>% mutate(column = value) - you need to keep the definition of the new column inside mutate's () .
  • Inside most dplyr functions, including mutate() you can use column names directly and unquoted. You don't need to select() a column ( select() is for keeping some columns and dropping others)
  • case_when() argument syntax is test_1 ~ value_1, test_2 ~ value_2
  • contains() is specifically made for column names, to detect the presence a string in a column/vector we'll use stringr::str_detect
  • mutate() can create brand new columns. You don't need to initialize the column with MyData$SubjectExpression <- "" . You should just delete that line.

Making all those changes, we get this:

MyData%>%
  mutate(SubjectExpression = 
    case_when(
      stringr::str_detect(Tweet, "yo creo") ~ 1,
      TRUE ~ 0
    )
  )

A base R alternative using grepl

MyData$SubjectExpression <- grepl("yo creo", MyData$Tweet)*1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM