简体   繁体   English

格式化模型在texreg或stargazer R中显示为科学

[英]Format model display in texreg or stargazer R as scientific

I just ran a statisitical model and i want it to display the results of the model as a table using stargazer. 我刚刚运行了一个统计模型,我希望它能够将模型的结果显示为使用观星者的表格。 However, the large numbers are displayed in full. 但是,大数字会全部显示。

fit2<-lm(A~B,data=C)
stargazer(fit2,type="text")

With this table as result 以此表为结果

===================================================
                      Dependent variable:      
                -------------------------------
                               A               
---------------------------------------------------
B                               -0.599             
                                (1.698)            
                          32,126,391.000         
                         (24,004,268.000)        

---------------------------------------------------
 Observations                       5               
R2                               0.040             
Adjusted R2                     -0.280             
Residual Std. Error   31,217,258.000 (df = 3e+00)  
F Statistic            0.124 (df = 1e+00; 3e+00)   
===================================================
Note:               *p<1e-01; **p<5e-02; ***p<1e-02

How do I get the large numbers displayed as scientific ie: 3.12e+07, please? 如何将大数字显示为科学,即:3.12e + 07,好吗? I have tried: 我试过了:

options("scipen"=-20,"digit"=2)
fit1<-format(lm(A~B,data=C),scientific=T)

This however causes the summary of the model to be distortrd and displayed as a single row. 但是,这会导致模型的摘要失真并显示为单行。 What are the best ways to format the numbers and retain the table structure, please? 请格式化数字并保留表格结构的最佳方法是什么?

                   CO          NO2        SM
Dec 2004 2.750000e+18 1.985136e+15 0.2187433
Jan 2005 2.980000e+18 2.144211e+15 0.1855678
Feb 2005 2.810000e+18 1.586491e+15 0.1764805
Dec 2005 3.010000e+18 1.755409e+15 0.2307153
Jan 2006 3.370000e+18 2.205888e+15 0.2046671
Feb 2006 3.140000e+18 2.084682e+15 0.1834232
Dec 2006 2.940000e+18 1.824735e+15 0.1837391
Jan 2007 3.200000e+18 2.075785e+15 0.1350665
Feb 2007 3.060000e+18 1.786481e+15 0.1179924
Dec 2007 2.750000e+18 1.645800e+15 0.2037340
Jan 2008 3.030000e+18 1.973517e+15 0.1515871
Feb 2008 3.040000e+18 1.753803e+15 0.1289968
Dec 2008 2.800000e+18 1.649315e+15 0.1968024
Jan 2009 3.090000e+18 1.856762e+15 0.1630173
Feb 2009 2.880000e+18 1.610011e+15 0.1446938
Dec 2009 2.660000e+18 1.562971e+15 0.1986012
Jan 2010 2.864333e+18 1.733843e+15 0.1559205
Feb 2010 2.881474e+18 1.469982e+15 0.1397536
Dec 2010 2.730000e+18 1.652751e+15 0.2129476
Jan 2011 3.030000e+18 1.862774e+15 0.1681295
Feb 2011 2.850000e+18 1.658988e+15 0.1531579

To do this, you can write your own function to take the large numbers and put them into scientific notation. 要做到这一点,你可以编写自己的函数来获取大数字并将它们放入科学记数法中。

First, load the stargazer package: 首先,加载stargazer包:

library(stargazer)

Then, create data with large numbers for the example: 然后,为示例创建大数字的数据:

set.seed(1)

C <- data.frame("A" = rnorm(10000, 30000, 10000),
                "B" = rnorm(10000, 7500, 2500))

Fit the model and store the stargazer results table in an object: 调整模型并将stargazer结果表存储在对象中:

fit2 <- lm(A ~ B, data = C) 

myResults <- stargazer(fit2, type = "text")

Create a function to take a stargazer table and convert large numbers into scientific notation. 创建一个函数来获取一个stargazer表并将大数字转换为科学计数法。 (This is not very flexible but can be with simple modifications. Right now only works for 1,000 - 99,999) (这不是很灵活,但可以进行简单的修改。现在只适用于1,000 - 99,999)

fixNumbers <- function(stargazer.object){

  so <- stargazer.object
  rows <- grep(".*[\\d+],[\\d+].*", so, perl = T)
  for(row in rows){

    # Get number and format into scientific notation
    number <- as.numeric(sub(".*([0-9]{1,2}),([0-9]+\\.?[0-9]*).*", "\\1\\2", so[row], perl = T))
    formatted_num <- sprintf("%.2e", number)
    so[row] <- sub("(.*)[0-9]{1,2},[0-9]+\\.?[0-9]*(.*)", paste0("\\1", formatted_num, "\\2"), so[row], perl = T)
  }

  # Print result
  for(i in 1:length(so)){
    cat(so[i], "\n")
  }
}

Give the new function ( fixNumbers ) your stargazer object: 给你的stargazer对象赋予新函数( fixNumbers ):

fixNumbers(myResults)

-- Here's all the code in one chunk: -- - 这是一个块中的所有代码: -

library(stargazer)

set.seed(1)

C <- data.frame("A" = rnorm(10000, 30000, 10000),
                "B" = rnorm(10000, 7500, 2500))

fit2 <- lm(A ~ B, data = C) 

myResults <- stargazer(fit2, type = "text")

fixNumbers <- function(stargazer.object){

  so <- stargazer.object
  rows <- grep(".*[\\d+],[\\d+].*", so, perl = T)
  for(row in rows){

    # Get number and format into scientific notation
    number <- as.numeric(sub(".*([0-9]{1,2}),([0-9]+\\.?[0-9]*).*", "\\1\\2", so[row], perl = T))
    formatted_num <- sprintf("%.2e", number)
    so[row] <- sub("(.*)[0-9]{1,2},[0-9]+\\.?[0-9]*(.*)", paste0("\\1", formatted_num, "\\2"), so[row], perl = T)
  }

  # Print result
  for(i in 1:length(so)){
    cat(so[i], "\n")
  }
}

fixNumbers(myResults)

Following Adam K idea, but with a bit more of optimized regex (and making use of vectorisation, which is good idea in R): 遵循Adam K的想法,但有一些优化的正则表达式(并使用矢量化,这在R中是个好主意):

fit2<-lm(CO~NO2,data=df)
test <- stargazer(fit2,type="text",)

It is a two line regex: you need to find the number (here of more than five numbers), that are string with number, comma and points 这是一个两行正则表达式:你需要找到数字(这里超过五个数字),这是带数字,逗号和点的字符串

m <- gregexpr("([0-9\\.,]{5,})", test)

you need to apply a transformation function to that (here supress the comma, make a number, and display it in scientific with 2 digits. You can consider also the formatC which gives a lot of possibility): 你需要对它应用一个转换函数(这里使用逗号,制作一个数字,然后用2位数字显示它。你可以考虑formatC ,它提供了很多可能性):

f = function(x){
  sprintf("%.2e",as.numeric( gsub(",","",x)))
}

and you apply it to your regex using the regmatches function 并使用regmatches函数将其应用于正则表达式

regmatches(test, m) <- lapply(regmatches(test, m), f)
test


 [1] ""                                                           
 [2] "========================================================"   
 [3] "                            Dependent variable:         "   
 [4] "                    ------------------------------------"   
 [5] "                                     CO                 "   
 [6] "--------------------------------------------------------"   
 [7] "NO2                              6.26e+02**              "  
 [8] "                                 (2.41e+02)              "  
 [9] "                                                        "   
[10] "Constant              1.81e+18***  "                        
[11] "                       (4.62e+17)    "                      
[12] "                                                        "   
[13] "--------------------------------------------------------"   
[14] "Observations                         10                 "   
[15] "R2                                 4.58e-01                "
[16] "Adjusted R2                        3.90e-01                "
[17] "Residual Std. Error 1.57e+17 (df = 8)"                      
[18] "F Statistic                 6.76e+00** (df = 1; 8)         "
[19] "========================================================"   
[20] "Note:                        *p<0.1; **p<0.05; ***p<0.01"   

To otbain the same output as the original: 要获得与原始输出相同的输出:

print(as.data.frame(test),quote = F,row.names = FALSE)



                                                       test

    ========================================================
                                Dependent variable:         
                        ------------------------------------
                                         CO                 
    --------------------------------------------------------
   NO2                              6.26e+02**              
                                    (2.41e+02)              

                         Constant              1.81e+18***  
                                              (4.62e+17)    

    --------------------------------------------------------
    Observations                         10                 
 R2                                 4.58e-01                
 Adjusted R2                        3.90e-01                
                       Residual Std. Error 1.57e+17 (df = 8)
 F Statistic                 6.76e+00** (df = 1; 8)         
    ========================================================
    Note:                        *p<0.1; **p<0.05; ***p<0.01

the data: 数据:

df <- read.table(text  = "
CO NO2 SM
 2.750000e+18 1.985136e+15 0.2187433
 2.980000e+18 2.144211e+15 0.1855678
 2.810000e+18 1.586491e+15 0.1764805
 3.010000e+18 1.755409e+15 0.2307153
 3.370000e+18 2.205888e+15 0.2046671
 3.140000e+18 2.084682e+15 0.1834232
 2.940000e+18 1.824735e+15 0.1837391
 3.200000e+18 2.075785e+15 0.1350665
 3.060000e+18 1.786481e+15 0.1179924
 2.750000e+18 1.645800e+15 0.2037340",header = T)

The problem is not that these packages cannot display scientific notation. 问题不在于这些包装不能显示科学记数法。 The problem is rather that your independent variables are on an extremely small scale. 问题在于您的自变量规模非常小。 You should rescale them before you use them in your model by multiplying the values by some constant. 通过将值乘以某个常量,您应该在模型中使用它们之前重新缩放它们。 For example, when you deal with the size of persons in kilometers, you may want to rescale them to meters or centimeters. 例如,当您处理以千米为单位的人的大小时,您可能希望将它们重新缩放到米或厘米。 This would make the table much easier to read than displaying the results in scientific notation. 这将使表格比以科学记数法显示结果更容易阅读。

Consider the following example: 请考虑以下示例:

a <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
b <- c(0.00020, 0.00024, 0.00024, 0.00026, 0.00021, 0.00022, 0.00023, 
    0.00022, 0.00023, 0.00022)
model.1 <- lm(a ~ b)

Next, create your table with texreg : 接下来,使用texreg创建表:

library("texreg")
screenreg(model.1)

This yields the following table: 这会产生下表:

=========================
             Model 1     
-------------------------
(Intercept)     -2.27 *  
                (0.94)   
b            32168.58 ***
             (4147.00)   
-------------------------
R^2              0.88    
Adj. R^2         0.87    
Num. obs.       10       
=========================
*** p < 0.001, ** p < 0.01, * p < 0.05

So the coefficients are pretty large. 所以系数非常大。 Let's try the same thing with stargazer : 让我们和stargazer者尝试同样的事情:

library("stargazer")
stargazer(model.1, type = "text")

The resulting table: 结果表:

===============================================
                        Dependent variable:    
                    ---------------------------
                                 a             
-----------------------------------------------
b                          32,168.580***       
                            (4,146.999)        

Constant                     -2.270**          
                              (0.944)          

-----------------------------------------------
Observations                    10             
R2                             0.883           
Adjusted R2                    0.868           
Residual Std. Error       0.212 (df = 8)       
F Statistic            60.172*** (df = 1; 8)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

Same problem: large coefficients. 同样的问题:大系数。 Now rescale your original variable b and recompute the model: 现在重新缩放原始变量b并重新计算模型:

b <- b * 10000
model.2 <- lm(a ~ b)

Try it again with texreg : texreg再试一次:

screenreg(model.2)

======================
             Model 1  
----------------------
(Intercept)  -2.27 *  
             (0.94)   
b             3.22 ***
             (0.41)   
----------------------
R^2           0.88    
Adj. R^2      0.87    
Num. obs.    10       
======================
*** p < 0.001, ** p < 0.01, * p < 0.05

And with stargazer : stargazer

stargazer(model.2, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                                 a             
-----------------------------------------------
b                            3.217***          
                              (0.415)          

Constant                     -2.270**          
                              (0.944)          

-----------------------------------------------
Observations                    10             
R2                             0.883           
Adjusted R2                    0.868           
Residual Std. Error       0.212 (df = 8)       
F Statistic            60.172*** (df = 1; 8)   
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

Now the coefficients look nicer and you do not need scientific notation. 现在系数看起来更好,你不需要科学记数法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM