![](/img/trans.png)
[英]Subset rows in an R dataframe based on partial match of multiple strings
[英]R - Multiple search and replace based on partial match within a column of a dataframe
我有一個看起來像這樣的發布者列表:
+--------------+
| Site Name |
+--------------+
| Radium One |
| Euronews |
| EUROSPORT |
| WIRED |
| RadiumOne |
| Eurosport FR |
| Wired US |
| Eurosport |
| EuroNews |
| Wired |
+--------------+
我想創建以下結果:
+--------------+----------------+
| Site Name | Publisher Name |
+--------------+----------------+
| Radium One | RadiumOne |
| Euronews | Euronews |
| EUROSPORT | Eurosport |
| WIRED | Wired |
| RadiumOne | RadiumOne |
| Eurosport FR | Eurosport |
| Wired US | Wired |
| Eurosport | Eurosport |
| EuroNews | Euronews |
| Wired | Wired |
+--------------+----------------+
我想了解如何復制在Power Query中使用的這段代碼:
搜索前4個字符
如果Text.Start([Site Name],4)=“ WIRE”,則為“ Wired”否則
搜索最后3個字符
如果Text.End([Site Name],3)=“一個”,則“ RadiumOne”,否則
如果找不到匹配項,則添加“ Rest”
它不必區分大小寫。
使用properCase
包和gsub
ifultools
,我們用“”替換第一個單詞之后的所有內容,即刪除它並分別對待Radium的特殊情況。 如果您有Radium case之類的例外情況,請使用這些例外情況更新您的帖子,以便我們可以找到更巧妙的解決方案:)
library("ifultools")
siteName=c("Radium One","Euronews","EUROSPORT","WIRED","RadiumOne","Eurosport FR","Wired US","Eurosport","EuroNews","Wired")
publisherName = gsub("^Radium$","Radiumone",gsub("\\s+.*","",properCase(siteName)))
# [1] "Radiumone" "Euronews" "Eurosport" "Wired" "Radiumone" "Eurosport" "Wired"
# [8] "Eurosport" "Euronews" "Wired"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.