[英]Adding and merging two dataframes in R
我有兩個數據框:
> df1
Long Short
EURUSD 47295 16057
GBPUSD 17385 6861
USDJPY 7146 9369
USDCHF 2704 5162
USDCAD 4705 11947
AUDUSD 13041 6654
NZDUSD 7184 4000
> df2
Long Short
EURUSD 318 408
GBPUSD 181 276
USDJPY 217 203
USDCHF 97 57
USDCAD 178 121
AUDUSD 142 202
NZDUSD 95 138
我需要最終的數據框架是這樣的:
> Final
Long Short
EURUSD 47613 16465
... ... ...
NZDUSD 7279 4138
合並/合並方法不起作用。 感謝您的幫助。
如果數據不具有行的名稱(我的個人偏好,總是不可控),這里有三種方法。
您的數據:
df1 <- read.table(text = "Symbol Long Short
EURUSD 47295 16057
GBPUSD 17385 6861
USDJPY 7146 9369
USDCHF 2704 5162
USDCAD 4705 11947
AUDUSD 13041 6654
NZDUSD 7184 4000", header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table(text = "Symbol Long Short
EURUSD 318 408
GBPUSD 181 276
USDJPY 217 203
USDCHF 97 57
USDCAD 178 121
AUDUSD 142 202
NZDUSD 95 138", header = TRUE, stringsAsFactors = FALSE)
方法2和方法3使用的單個輔助函數:
psum <- function(..., na.rm = FALSE) rowSums(sapply(list(...), c), na.rm = na.rm)
(這類似於pmin
和family ,這是必需的,以便NA
不會使人衰弱...)
cbind
這是@Leo P.的評論,它依賴於兩個data.frames始終具有完全相同的行順序:
cbind(df1[,1,drop=FALSE], df1[,2:3] + df2[,2:3])
# Symbol Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566 7137
# 3 USDJPY 7363 9572
# 4 USDCHF 2801 5219
# 5 USDCAD 4883 12068
# 6 AUDUSD 13183 6856
# 7 NZDUSD 7279 4138
此方法不依賴於兩者中行的有序甚至是行。 為了演示它的工作原理,我將從其中一個數據幀中刪除一行:
df2 <- df2[-3,]
重命名第二幀的列,以便我們可以使它們合並並保留數據:
colnames(df2) <- c("Symbol", "Long2", "Short2")
和實際的工作:
colnames(df2) <- c("Symbol", "Long2", "Short2")
within(merge(df1, df2, by = "Symbol", all = TRUE), {
Long <- psum(Long, Long2, na.rm = TRUE)
Short <- psum(Short, Short2, na.rm = TRUE)
# cleanup, remove unneeded columns
Long2 <- Short2 <- NULL
})
# Symbol Long Short
# 1 AUDUSD 13183 6856
# 2 EURUSD 47613 16465
# 3 GBPUSD 17566 7137
# 4 NZDUSD 7279 4138
# 5 USDCAD 4883 12068
# 6 USDCHF 2801 5219
# 7 USDJPY 7146 9369
dplyr
加入 用新鮮的開始df1
和df2
(全用原來的名字),我再次刪除行:
df2 <- df2[-3,]
和工作:
library(dplyr)
full_join(df1, rename(df2, Long2 = Long, Short2 = Short), by = "Symbol") %>%
mutate(
Long = psum(Long, Long2, na.rm = TRUE),
Short = psum(Short, Short2, na.rm = TRUE)
) %>%
select(-Long2, -Short2)
# Symbol Long Short
# 1 EURUSD 47613 16465
# 2 GBPUSD 17566 7137
# 3 USDJPY 7146 9369
# 4 USDCHF 2801 5219
# 5 USDCAD 4883 12068
# 6 AUDUSD 13183 6856
# 7 NZDUSD 7279 4138
您問題中的數據代表性不足。 根據您的評論,看來您真正擁有的是:
str(df1)
# 'data.frame': 7 obs. of 2 variables:
# $ Long : Factor w/ 7 levels "2704","4705",..: 7 6 3 1 2 5 4
# $ Short: Factor w/ 7 levels "4000","5162",..: 7 4 5 2 6 3 1
(供以后參考,如果您以明確的消耗形式提供數據,則將更加清楚,例如:
# dput(df1) ... possibly with options(deparse.max.lines=NULL) beforehand
structure(list(
Long = structure(c(7L, 6L, 3L, 1L, 2L, 5L, 4L), .Label = c("2704", "4705", "7146", "7184", "13041", "17385", "47295"), class = "factor"),
Short = structure(c(7L, 4L, 5L, 2L, 6L, 3L, 1L), .Label = c("4000", "5162", "6654", "6861", "9369", "11947", "16057"), class = "factor")),
.Names = c("Long", "Short"),
row.names = c("EURUSD", "GBPUSD", "USDJPY", "USDCHF", "USDCAD", "AUDUSD", "NZDUSD"),
class = "data.frame")
要從df1
轉到上面我讀到的內容,請執行以下操作:
# convert from nascent factors to numbers
df1[] <- lapply(df1[], function(a) as.numeric(as.character(a)))
# bring the row names into a column
df1$Symbol <- rownames(df1)
列的順序將不同,但這是修飾性的,如果足夠重要,則很容易解決。 您可以選擇刪除具有行名rownames(df1) <- NULL
的行名。 這也需要對df2
進行。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.