[英]Compare 3 columns in two different data frames and action based on comparison result
[英]Compare two data frames based on common columns
我有兩個csv文件:
文件1:
SN CY Year Month Day Hour Lat Lon
196101 1 1961 1 14 12 8.3 134.7
196101 1 1961 1 14 18 8.8 133.4
196101 1 1961 1 15 0 9.1 132.5
196101 1 1961 1 15 6 9.3 132.2
196101 1 1961 1 15 12 9.5 132
196101 1 1961 1 15 18 9.9 131.8
文件2:
Year Month Day RR Hour Lat Lon
1961 1 14 0 0 14.0917 121.055
1961 1 14 0 6 14.0917 121.055
1961 1 14 0 12 14.0917 121.055
1961 1 14 0 18 14.0917 121.055
1961 1 15 0 0 14.0917 121.055
1961 1 15 0 6 14.0917 121.055
如果文件2中的行具有相同的年,月,日和小時,則我想在文件2中添加另一列,並在文件2中的行存在時輸入“真”,否則為“假”。 然后另存為csv文件。
所需的輸出:
Year Month Day RR Hour Lat Lon com
1961 1 14 0 0 14.0917 121.055 FALSE
1961 1 14 0 6 14.0917 121.055 FALSE
1961 1 14 0 12 14.0917 121.055 TRUE
1961 1 14 0 18 14.0917 121.055 TRUE
1961 1 15 0 0 14.0917 121.055 TRUE
1961 1 15 0 6 14.0917 121.055 TRUE
這是我的腳本:
jtwc <- read.csv("file1.csv",header=T,sep=",")
stn <- read.csv("file2.csv",header=T,sep=",")
if ((jtwc$Year == "stn$YY") & (jtwc$Month == "stn$MM") & (jtwc$Day == "stn$DD") &(jtwc$Hour == "stn$HH")){
stn$com <- "TRUE"
} else {
stn$com <- "FALSE"
}
write.csv(stn,file="test.csv",row.names=T)
這給出了一個錯誤:
In if ((jtwc$Year == "stn$YY") & (jtwc$Month == "stn$MM") & (jtwc$Day == :the condition has length > 1 and only the first element will be used
您還可以使用dplyr / tidyverse:
library(tidyverse)
d2 %>%
left_join(select(d1, Year, Month, Day, Hour, Com=Lon)) %>%
mutate(Com=ifelse(is.na(Com), FALSE, TRUE))
Joining, by = c("Year", "Month", "Day", "Hour")
Year Month Day RR Hour Lat Lon Com
1 1961 1 14 0 0 14.0917 121.055 FALSE
2 1961 1 14 0 6 14.0917 121.055 FALSE
3 1961 1 14 0 12 14.0917 121.055 TRUE
4 1961 1 14 0 18 14.0917 121.055 TRUE
5 1961 1 15 0 0 14.0917 121.055 TRUE
6 1961 1 15 0 6 14.0917 121.055 TRUE
使用data.table
快速而骯臟的解決方案:
fread
讀取文件。 file1
提取想要的列(因為您僅對file2
感興趣) merge
合並文件 file1
中沒有匹配項,則添加FALSE
碼:
library(data.table)
result <- merge(fread("file2.csv"),
fread("file1.csv")[, .(Year, Month, Day, Hour, com = TRUE)],
all.x = TRUE)[is.na(com), com := FALSE]
result
Year Month Day Hour RR Lat Lon com
1: 1961 1 14 0 0 14.0917 121.055 FALSE
2: 1961 1 14 6 0 14.0917 121.055 FALSE
3: 1961 1 14 12 0 14.0917 121.055 TRUE
4: 1961 1 14 18 0 14.0917 121.055 TRUE
5: 1961 1 15 0 0 14.0917 121.055 TRUE
6: 1961 1 15 6 0 14.0917 121.055 TRUE
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.