简体   繁体   English

R在数据帧的列中标识文本字符串

[英]R Identifing text string within column of dataframe

One column of my data frame has words and phrases. 我的数据框的一列有单词和短语。 I am trying to create a dummy variable for those fields within this column that have specific strings of text anywhere within. 我正在尝试为此列中的那些字段创建一个虚拟变量,其中包含特定的文本字符串。

For example: 例如:

  • kite 风筝
  • cars 汽车
  • box kites 盒子风筝
  • model cars 模型车
  • i like kites that fly 我喜欢放风筝
  • cars of the world 世界汽车

      myvector<-c("kite","cars","box kites","model cars","i like kites that fly", "cars of the world") 

I would want to identify all the fields with the string "kite" 我想用字符串“kite”识别所有字段

I've tried a few things such as any() , which() and %in% but nothing has worked so far. 我已经尝试了一些东西,比如any()which()%in%但到目前为止还没有任何工作。

Any help greatly appreciated 任何帮助非常感谢

You didn't provided any reproducible example. 您没有提供任何可重现的示例。 But your answer will be grepl. 但你的答案将是grepl。

grepl("kite", df$words)

It will return a logical vector if the word is in the row. 如果单词在行中,它将返回逻辑向量。

If you want to match multiple words use logical or | 如果要匹配多个单词,请使用logical或| inside the string to match 在匹配的字符串内

grepl("kite|cars|box kites", df$words)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM