[英]Adding a factor column based on parts of another column
我有一些看起來像這樣的數據:
SS <- structure(list(rn =
c("Exp.618.1.7..ABC.TRE854.HS.2...1.Saline...1...A.",
"Exp.618.1.7..ABC.TRE854.HS.2...4.Res..Reference...1...A.", "Exp.618.1.7..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..100nM...1...A.",
"Exp.618.1.7..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..1.00uM...1...A.",
"Exp.618.1.7..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..10.0uM...1...A.",
"Exp.618.2.5..ABC.TRE854.HS.2...1.Saline...1...A.", "Exp.618.2.5..ABC.TRE854.HS.2...4.Res..Reference...1...A.",
"Exp.618.2.5..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..300nM...1...A.",
"Exp.618.2.5..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..3.0uM...1...A.",
"Exp.618.2.5..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..30uM...1...A.",
"Exp.622.1.2..ABC.TRE854.HS.2...1.Saline...1...A.", "Exp.622.1.2..ABC.TRE854.HS.2...4.Res..Reference...1...A.",
"Exp.622.1.2..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..100nM...1...A.",
"Exp.622.1.2..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..1.00uM...1...A.",
"Exp.622.1.2..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..10.0uM...1...A.",
"Exp.622.2.5..ABC.TRE854.HS.2...1.Saline...1...A.", "Exp.622.2.5..ABC.TRE854.HS.2...4.Res..Reference...1...A.",
"Exp.622.2.5..ABC.TRE854.HS.2...8.ABC.TRE854.HS.2..300nM...1...A.",
"Exp.622.2.5..ABC.TRE854.HS.2...12.ABC.TRE854.HS.2..3.0uM...1...A.",
"Exp.622.2.5..ABC.TRE854.HS.2...16.ABC.TRE854.HS.2..30uM...1...A."
), V1 = c(6.08174172247795, -273.068131175906, -38.0098754654436,
-44.1874819464636, -126.058280657819, 28.7111941404515, -326.124708404277,
-61.0348906065704, -63.7440680070101, -62.8961106505329, 18.9484530926351,
-607.977222113268, -212.18247673418, -179.193611578799, -230.372071747453,
11.6278896202125, -258.129269330527, -26.634614887808, -29.8940173506221,
-63.2992704853608), Exp = c("Exp.618.1.", "Exp.618.1.", "Exp.618.1.",
"Exp.618.1.", "Exp.618.1.", "Exp.618.2.", "Exp.618.2.", "Exp.618.2.",
"Exp.618.2.", "Exp.618.2.", "Exp.622.1.", "Exp.622.1.", "Exp.622.1.",
"Exp.622.1.", "Exp.622.1.", "Exp.622.2.", "Exp.622.2.", "Exp.622.2.",
"Exp.622.2.", "Exp.622.2."), Value_norm = c(-0.0222718839298028,
1, 0.139195574751849, 0.16181852402981, 0.461636735546466, -0.0880374697180561,
1, 0.187151997483457, 0.195459179768711, 0.192859078228946, -0.0311663865083172,
1, 0.348997411443565, 0.294737376765432, 0.3789156293499, -0.0450467692035472,
1, 0.103183242089851, 0.115810258279326, 0.245223142069596)), .Names = c("rn",
"V1", "Exp", "Value_norm"), row.names = c(NA, 20L), class = "data.frame")
在rn列中,我需要使用一些名稱來創建因子,以便可以在GGplot2中進行繪制。 這些名稱是:
Saline
Reference
100nM
300nM
1uM
3uM
10uM
30uM
我希望最終數據看起來像示例一樣,但最后要有一個因素列,上面有這些標簽之一。
抱歉,我僅擁有我的數據圖片,但我希望它能很好地格式化,而我無法在此處的對話框中這樣做!
提前致謝!
好吧,如果您完全匹配該列中的術語,將會更容易。 如果可以的話,你可以做
rx <- "\\b(Saline|Reference|100nM|300nM|1.00uM|3.0uM|10.0uM|30uM)\\b"
SS$type <- regmatches(SS$rn,regexpr(rx, SS$rn))
這應該給出
c("1.00uM", "10.0uM", "100nM", "3.0uM", "300nM", "30uM", "Reference", "Saline")
如果要重命名不同的名稱,可以執行
remap <- c("1.00uM"="1uM", "3.0uM"="3uM", "10.0uM"="10uM")
SS$type[SS$type %in% names(remap)] <- remap[SS$type[SS$type %in% names(remap)]]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.