简体   繁体   中英

R: combine two csv files based on matching

I have two csv files:

csv1 <- data.frame(y=c("classA", "classB", "classA", "classB", "classA", "classC"), 
                   DBID=c("d1", "d1", "d2", "d3", "d3", "d3")) 

       y DBID
1 classA   d1
2 classB   d1
3 classA   d2
4 classB   d3
5 classA   d3
6 classC   d3

csv2 <- data.frame(tm=c("t1","t1","t2"), 
                   y=c("classA","classC","classB"))

  tm      y
1 t1 classA
2 t1 classC
3 t2 classB

I want to extract information to get a table by matching column y in both csv files, ie

t1 has classA and classC in csv2 file, so, all the DBID classified as classA in csv1 (d1,d2 and d3) are listed in the resulting dataframe with t1 in the first column, d1,d2 and d3 as the second column

t2 has class B in csv2 file, so, all the DBID classified as classB in csv1 (d1 and d3) are listed in the result dataframe with t2 listed in the first column, d1 and d3 as the second column.

and get a dataframe as follows:

tm DBID endcol
t1 d1   1
t1 d2   1
t1 d3   1
t1 d3   1
t2 d1   1
t2 d3   1

Please instruct how to do so with R.

Maybe merge ?

> merge(csv1,csv2)
       y DBID tm
1 classA   d1 t1
2 classA   d2 t1
3 classA   d3 t1
4 classB   d1 t2
5 classB   d3 t2
6 classC   d3 t1

You can add the column of all ones yourself. merge is (by default) merging the two based on columns with identical names, which is why I didn't have to pass any other arguments. If you have other column names that match, you'll need to specify the by argument explicitly to get the behavior you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM