[英]add values from data.frame to a new column in another data.frame that matches two criteria
我有两个 data.frames 看起来像(不是全长):
第一个 - AnalysisData
Stock Symbol.SEK TPDate TP PTP
<chr> <chr> <chr> <dbl> <dbl>
1 AAK AAK.ST 2018-08-23 197 62
2 ABB … ABB.ST 2016-09-30 11 188
3 Addt… ADDT-B.ST 2017-11-06 237 233
4 Ahls… AM1S.ST 2015-10-14 148 226
5 Alfa… ALFA.ST 2018-04-23 272 188
第二个 - master_df_tq
(编辑名称)
symbol[,1] date open high low close volume adjusted Delt.1.arithmetic
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAK.ST 2015-01-02 69.7 69.9 69.0 69.2 133860 49.8 NA
2 AAK.ST 2015-01-05 69.2 69.9 68.8 69.2 93168 49.7 -0.00670
3 AAK.ST 2015-01-07 68.3 69.1 67.7 68.4 308952 49.2 -0.0128
4 AAK.ST 2015-01-08 69.5 69.9 68.4 69.7 405258 50.1 0.0168
5 AAK.ST 2015-01-09 69.5 70.5 69.2 70 214548 50.3 -0.000239
6 AAK.ST 2015-01-12 70.4 70.8 69.4 70.0 300024 50.3 0.0142
7 AAK.ST 2015-01-13 70.1 70.5 69.5 70.0 770190 50.3 -0.00450
我的目标是将Delt.1.arithmetic
中的值添加到AnalysisData
中与相同日期和符号的标准匹配的新列中,即date
= TPDate
和symbol
= Symbol.SEK
。
我尝试使用来自dplyr
的join
function 但我无法解决。 我已经尝试过以下两种方法:
left_join(master_df_tq, AnalysisData, by = c("date" = "TPDate" , "Symbol.SEK" = "Symbol.SEK"))
master_df_tq %>% left_join(AnalysisData, by = c("date" = "TPDate" , "Symbol.SEK" = "Symbol.SEK"))
但它们不起作用,我收到以下消息:
Warning message:
Column `symbol`/`Symbol.SEK` has different attributes on LHS and RHS of join
有人有想法吗? 我应该使用merge
还是我缺少的数据格式?
编辑:我已将symbol
更改为Symbol.SEK
并将其设为字符,现在错误消息不会弹出,但我的数据AnalysisData
没有发生任何事情。 控制台中显示的只是以下内容
> master_df_tq %>% left_join(AnalysisData, by = c("date" = "TPDate" , "Symbol.SEK" = "Symbol.SEK"))
# A tibble: 146,685 x 26
# Groups: Symbol.SEK [131]
Symbol.SEK date open high low close volume adjusted Delt.1.arithmet… Stock TP PTP TP_Diff
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 AAK.ST 2015… 69.7 69.9 69.0 69.2 133860 49.8 NA NA NA NA NA
2 AAK.ST 2015… 69.2 69.9 68.8 69.2 93168 49.7 -0.00670 NA NA NA NA
3 AAK.ST 2015… 68.3 69.1 67.7 68.4 308952 49.2 -0.0128 NA NA NA NA
4 AAK.ST 2015… 69.5 69.9 68.4 69.7 405258 50.1 0.0168 NA NA NA NA
5 AAK.ST 2015… 69.5 70.5 69.2 70 214548 50.3 -0.000239 NA NA NA NA
EDIT2.0:
这是 dput 的最后几行 output(它们在 master_df_tq 中总共有 146684 行): 注意:名称是 master_df_tq,而不是我首先写的 master_df_sq
139149:140404, 138775:139148, 140405:141660, 141661:142916,
142917:144172, 144173:145428)), row.names = c(NA, -131L), class = c("tbl_df",
"tbl", "data.frame"), .drop = FALSE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
> dput(head(master_df_tq))
structure(list(Symbol.SEK = c("AAK.ST", "AAK.ST", "AAK.ST", "AAK.ST",
"AAK.ST", "AAK.ST"), date = c("2015-01-02", "2015-01-05", "2015-01-07",
"2015-01-08", "2015-01-09", "2015-01-12"), open = c(69.683296,
69.216698, 68.333298, 69.483299, 69.466698, 70.449997), high = c(69.866699,
69.916702, 69.083298, 69.916702, 70.483299, 70.75), low = c(69.033302,
68.849998, 67.716698, 68.433296, 69.150002, 69.433296), close = c(69.233299,
69.199997, 68.400002, 69.650002, 70, 70.033302), volume = c(133860,
93168, 308952, 405258, 214548, 300024), adjusted = c(49.755833,
49.731899, 49.156975, 50.055305, 50.306839, 50.330769), Delt.1.arithmetic = c(NA,
-0.00669598062640442, -0.0127628162788117, 0.0168292916288044,
-0.000238920722517966, 0.0141549696229983)), row.names = c(NA,
-6L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
Symbol.SEK = "AAK.ST", .rows = list(1:6)), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"), .drop = FALSE))
编辑 3.0:
structure(list(Stock = c("AAK", "ABB Ltd", "Addtech B", "Ahlstrom-Munksjö Oyj",
"Alfa Laval", "Arion Banki SDB"), Symbol.SEK = c("AAK.ST", "ABB.ST",
"ADDT-B.ST", "AM1S.ST", "ALFA.ST", "ARION-SDB.ST"), TPDate = c("2019-10-10",
"2019-10-10", "2019-10-10", "2019-10-10", "2019-10-10", "2019-10-10"
), TP = c(197, 11, 237, 148, 272, 291), PTP = c(62, 188, 233,
226, 188, 201), TP_Diff = c(135, -177, 4, -78, 84, 90), AREC = c("Buy",
"Hold", "Hold", "Buy", "Buy", "Buy"), APREC = c("Sell", "Sell",
"Buy", "Sell", "Buy", "Buy"), RTP = c(2.17741935483871, -0.941489361702128,
0.0171673819742489, -0.345132743362832, 0.446808510638298, 0.447761194029851
), Analyst = c("SEB", "Nordea", "DBN", "Swedbank", "Avanza",
"SEB"), Bransch = c("Konsument", "Industri", "Industri", "Råvaror",
"Industri", "Finans"), return = c(0.00825375780488757, 0.00224086274509805,
-0.0242914979757085, 0, 0.0171685153116452, 0.00486223662884933
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
我解决了这个问题,因为总会有一个匹配,但是如果这些值不匹配任何东西,整数(0)就会有问题
for (i in 1:nrow(AnalysisData)) {
index <- which(AnalysisData$Symbol.SEK[i]==master_df_tq$Symbol.SEK & AnalysisData$TPDate[i]==master_df_tq$date)
AnalysisData$return[i] <- master_df_tq$Delt.1.arithmetic[index]
}
这会给你想要的 output 吗?
result <- merge(AnalysisData, master_df_tq, by.x=c('TPDate', 'Symbol.SEK'), by.y=c('date', 'Symbol.SEK'), , all.x=TRUE)
result
另外,我不确定为什么这不是您正在寻找的 output - 从您的帖子中复制粘贴
> master_df_tq %>% left_join(AnalysisData, by = c("date" = "TPDate" , "Symbol.SEK" = "Symbol.SEK"))
# A tibble: 146,685 x 26
# Groups: Symbol.SEK [131]
Symbol.SEK date open high low close volume adjusted Delt.1.arithmet… Stock TP PTP TP_Diff
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 AAK.ST 2015… 69.7 69.9 69.0 69.2 133860 49.8 NA NA NA NA NA
2 AAK.ST 2015… 69.2 69.9 68.8 69.2 93168 49.7 -0.00670 NA NA NA NA
3 AAK.ST 2015… 68.3 69.1 67.7 68.4 308952 49.2 -0.0128 NA NA NA NA
4 AAK.ST 2015… 69.5 69.9 68.4 69.7 405258 50.1 0.0168 NA NA NA NA
5 AAK.ST 2015… 69.5 70.5 69.2 70 214548 50.3 -0.000239 NA NA NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.