简体   繁体   中英

Find unique length of one column by matching other columns

So i have this data frame in CSV format: 例

and i would like to know how to find unique length of different lecturer.id matched by program.id and program.id.ime.

So my outcome should be variable which would give me length of unique lecturer.id who are teaching English (in my case i can see from the data or picture that this is 10 lecturers), and length of unique lecturer.id who are teaching History and so on. So i would like to generate code that:

If this lecturer.id matches this program.id than paste length of this program.id.ime which is =10 othervise paste different length

I am thinking in this direction (but it is not what i want)

length(unique(subset(df, lecturer.id==program.id)))

I was thinking of using aggregate , but I need this in a variable that will produce different lengths according to program.id and program.id.ime .

So small part of my data frame looks like this

lecturer.id<- c(111, 111,112,126,127,132,139,143,155)
program.id<- c(35,35,35,35,44,44,44,42,42)
program.id.ime<- c('English', 'English', 'English', 'English', 
 'History', 'History', 'History', 'Sociology', 'Sociology')

df <- data.frame(lecturer.id, program.id, program.id.ime)

So i know that lecturer with id 111 is teaching on program with id 35 and this program name is English. My outcome should be the length or the number of all lecturers that are teaching English, and length of all lecturers that are teaching History and so on.

So as I am combining R code with latex (hmisc) my output is a table (because of the data confidentiality I deleted some variables: 在此处输入图片说明

I would like to generate number in parentheses which is the example of the OUTPUT I want. It is important to generate it automatically by matching columns.

The whole point is that I am doing PDF reports for seperate lecturer and I am matching lecturer with his lecture.id based on foor-loop . So output is PDF report for one lecturer and in the table in second picture I need number of all lecturers on specific course.

Using the data in the link (changed the file name to 'Miha.csv')

library(data.table)#v1.9.5+
df1 <- read.csv('Miha.csv', sep=';')

Or

df1 <- fread('Miha.csv') #in this case, the object will be `data.table`
setDT(df1)[, list(n= uniqueN(lecturer.id)), .(program.id, program.id.ime)
   ][, program.id.ime:=sprintf('%s (%d)', program.id.ime, n)][, n:=NULL]
#   program.id   program.id.ime
#1:         35      English (9)
#2:         44      History (4)
#3:         43    Sociology (8)
#4:         34  Politology (21)
#5:         40 Antropology (62)
#6:         41       Music (65)
#7:        116    Music II (10)

In the dataset, each 'program.id.ime' have a single 'program.id', so

setDT(df1)[, list(program.id.ime=sprintf('%s (%d)',
      program.id.ime[1L], uniqueN(lecturer.id))) , .(program.id)]
#    program.id   program.id.ime
# 1:         35      English (9)
# 2:         44      History (4)
# 3:         43    Sociology (8)
# 4:         34  Politology (21)
# 5:         40 Antropology (62)
# 6:         41       Music (65)
# 7:        116    Music II (10)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM