简体   繁体   中英

factor within factor summary statistical analysis in r

I have a data frame with various stock information that I have used to create a positive, negative, or not determined sentiment with respect to a company name. The head of this data are:

 head(companyReturnsNameScore)
#----------
  PERMNO     date EXCHCD SICCD TICKER     PRC   VOL       RET SHROUT companyNameSentiment        companyName
1  85814 19980831      3  5960   CTAC  6.1875 27989 -0.489691   6431       Not Determined 1 800 CONTACTS INC
2  85814 20021231      3  5960   CTAC 27.5700 97498  1.177725  11388       Not Determined 1 800 CONTACTS INC
3  85814 19990129      3  5960   CTAC 14.7500  5658 -0.180556   6275       Not Determined 1 800 CONTACTS INC
4  85814 20021031      3  5960   CTAC  9.0300 20192 -0.097000  11382       Not Determined 1 800 CONTACTS INC
5  85814 20021129      3  5960   CTAC 12.6600 15474  0.401993  12082       Not Determined 1 800 CONTACTS INC
6  85814 20070731      3  5961   CTAC 23.2400  5574 -0.009378  13619       Not Determined 1 800 CONTACTS INC
  marketCap marketCapDeclile
1  39791.81                2
2 313967.16                6
3  92556.25                4
4 102779.46                4
5 152958.12                5
6 316505.56                6

I am trying to perform statistical analysis by decile ranking of market cap (marketCapDecile), but within each decile rank, I want to further perform a by analysis for each sentiment factor. That means that for each decile rank, I want to see statistical output for each factor level of "positive, negative, not determined." When I enter what I think is the correct command for a list of factors,

by( companyReturnsNameScore$RET, c(companyReturnsNameScore$marketCapDeclile, 
                           companyReturnsNameScore$companyNameSentiment), summary)

I unfortunately get the following error:

Error in tapply(seq_len(1785812L), list(`c(companyReturnsNameScore$marketCapDeclile, companyReturnsNameScore$companyNameSentiment)` = c(2L, 
   : arguments must have same length

I have 10 factor levels for the market cap decile, and three for the sentiment factor classification, so essentially, I want 30 analyses performed... Problem is, I am having difficulty performing that factor within factor analysis.

What am I doing incorrectly? How can I perform a factor within factor analysis?

You second argument concatenates two vectors, making them twice as long as the first argument:

  length( c( factor(1:5), factor(6:10) ) )
[1] 10

You have (at least) two choices: either use a list (noting that the help function for ?by says to use a list, or use the interaction function which returns a single vector of the length of the longest input:

 # 1
 by( companyReturnsNameScore$RET, 
      list( companyReturnsNameScore$marketCapDeclile, 
                       companyReturnsNameScore$companyNameSentiment),
      summary)
 # 2
 by( companyReturnsNameScore$RET, 
        interaction( companyReturnsNameScore$marketCapDeclile, 
                       companyReturnsNameScore$companyNameSentiment), 
        summary)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM