簡體   English   中英

R-基於數據幀列內的部分匹配進行多次搜索和替換

[英]R - Multiple search and replace based on partial match within a column of a dataframe

我有一個看起來像這樣的發布者列表:

+--------------+
|  Site Name   |
+--------------+
| Radium One   |
| Euronews     |
| EUROSPORT    |
| WIRED        |
| RadiumOne    |
| Eurosport FR |
| Wired US     |
| Eurosport    |
| EuroNews     |
| Wired        |
+--------------+

我想創建以下結果:

+--------------+----------------+
|  Site Name   | Publisher Name |
+--------------+----------------+
| Radium One   | RadiumOne      |
| Euronews     | Euronews       |
| EUROSPORT    | Eurosport      |
| WIRED        | Wired          |
| RadiumOne    | RadiumOne      |
| Eurosport FR | Eurosport      |
| Wired US     | Wired          |
| Eurosport    | Eurosport      |
| EuroNews     | Euronews       |
| Wired        | Wired          |
+--------------+----------------+

我想了解如何復制在Power Query中使用的這段代碼:

搜索前4個字符

如果Text.Start([Site Name],4)=“ WIRE”,則為“ Wired”否則

搜索最后3個字符

如果Text.End([Site Name],3)=“一個”,則“ RadiumOne”,否則

如果找不到匹配項,則添加“ Rest”

它不必區分大小寫。

使用properCase包和gsub ifultools ,我們用“”替換第一個單詞之后的所有內容,即刪除它並分別對待Radium的特殊情況。 如果您有Radium case之類的例外情況,請使用這些例外情況更新您的帖子,以便我們可以找到更巧妙的解決方案:)

library("ifultools")

siteName=c("Radium One","Euronews","EUROSPORT","WIRED","RadiumOne","Eurosport FR","Wired US","Eurosport","EuroNews","Wired")

publisherName = gsub("^Radium$","Radiumone",gsub("\\s+.*","",properCase(siteName)))

 # [1] "Radiumone" "Euronews"  "Eurosport" "Wired"     "Radiumone" "Eurosport" "Wired"    
 # [8] "Eurosport" "Euronews"  "Wired"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM