简体   繁体   中英

R: How to create a new column in a data frame with “countif” values from another data frame?

I have one dataframe (df1) that looks as follows. It indicates years when a company was active in a specific market.

Company  Country  Year
A        Austria  2010
A        Germany  2010
A        Austria  2011
B        Italy    2010

I now have a second dataframe (df2) that looks as follows. It lists all investments of a company in a country at a given time, by investment type as dummy variabes.

Company  Country  Year  JointVenture  M&A  Greenfield
A        Austria  2010  1             0    0
A        Austria  2010  0             1    0
A        Austria  2010  1             0    0
...

My question now is as follows: I want to add a new column to df1 , including the "countif" of each investment type as indicated in df2. For instance, new df1:

Company  Country  Year  Count.JointVenture  Count.M&A  Count.Greenfield
A        Austria  2010  2                   1          0
A        Germany  2010  ...........
A        Austria  2011
B        Italy    2010

Also, how would I be able to then add new columns to df1 transforming these counts into dummy variables (1 if >0; 0 if 0)?

Thanks and sorry for this basic question, but I did not find a fitting solutions in existing threads.

Cheers, Martin

using aggregate() and ifelse() functions

# test data
df <- data.frame(Company = rep("A", 3), 
                 Country = rep("Austria", 3), 
                 Year = rep(2010, 3), 
                 JointVenture = c(1,0,1), 
                 MnA = c(0,1,0), 
                 Greenfield = rep(0,3))
# this is the new df
counts <- aggregate(cbind(JointVenture, MnA, Greenfield)~Country+Company+Year, data = df, FUN = sum)

# dummy
counts$dummyJointVenture <- ifelse(counts$JointVenture > 0, 1, 0)
counts$dummyMnA <- ifelse(counts$MnA > 0, 1, 0)
counts$dummyGreenfield <- ifelse(counts$Greenfield > 0, 1, 0)

I throw my data.table attempt into the arena:

df <- fread("Company  Country  Year
             A        Austria  2010
             A        Germany  2010
             A        Austria  2011
             B        Italy    2010")

df2 <- fread("Company  Country  Year  JointVenture  M&A  Greenfield
              A        Austria  2010  1             0    0
              A        Austria  2010  0             1    0
              A        Austria  2010  1             0    0")

setkey(df2, Company, Country, Year)
df2[,c("JointVenture", "M&A", "Greenfield") := .(sum(JointVenture), sum(`M&A`), sum(Greenfield)), by=.(Company, Country, Year)]
merge(x=df, y=unique(df2), by=c("Company", "Country", "Year"), all.x=T, all.y=F, suffixes = c("", "Count."))

Which results in

   Company Country Year JointVenture M&A Greenfield
1:       A Austria 2010            2   1          0
2:       A Austria 2011           NA  NA         NA
3:       A Germany 2010           NA  NA         NA
4:       B   Italy 2010           NA  NA         NA

Using dplyr::summarise_each and merge with Martin's data.

df <- fread("Company  Country  Year
             A        Austria  2010
             A        Germany  2010
             A        Austria  2011
             B        Italy    2010")

df2 <- fread("Company  Country  Year  JointVenture  MA  Greenfield
              A        Austria  2010  1             0    0
              A        Austria  2010  0             1    0
              A        Austria  2010  1             0    0")
library(dplyr)
df2 %>% 
  group_by(Company, Country, Year) %>%
summarise_each(funs(sum), JointVenture:Greenfield) %>%
full_join(df, by = c("Company", "Country", "Year")) -> df

edits: replaced a summarise with a summarise_each with input from @zacdav and replace the merge by full_join to stay in dplyr

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM