R：高于和低于基准的值填0

Question

I'm currently writing my master thesis and when I made a regression I found out that I have some outliers which I would like to either delete or fill in a zero.我目前正在写我的硕士论文，当我进行回归时，我发现我有一些异常值，我想删除或填写零。 I got a dataframe with company names and their daily returns from 2010 until 2021.我得到了一个 dataframe 公司名称及其从 2010 年到 2021 年的每日回报。

The dataframe is called xsr. dataframe 称为 xsr。 I want to find the outliers which are above 0.5 and below -0.5.我想找到高于 0.5 且低于 -0.5 的异常值。 I managed to create a dataframe according to this condition xsr_short <- xsr[,c(2:214)] <0.5 .我设法根据这个条件创建了一个 dataframe xsr_short <- xsr[,c(2:214)] <0.5 。 Then I tried to pick the false values outliers <- subset(xsr_short, xsr_short = FALSE) .然后我尝试选择错误值outliers <- subset(xsr_short, xsr_short = FALSE) 。 Which just gives me back the initial xsr_short .这只是给了我最初的xsr_short 。

I also tried it with the select command: xsr_short <- select(xsr, c('ABBN SW Equity':'ZWM SW Equity') < 0.5) .我还使用select命令进行了尝试： xsr_short <- select(xsr, c('ABBN SW Equity':'ZWM SW Equity') < 0.5) 。 The output to this is: output 到这个是：

    Error in `select()`:
! NA/NaN argument
Backtrace:
  1. dplyr::select(xsr, c("ABBN SW Equity":"ZWM SW Equity") < 0.5)
 22. base::.handleSimpleError(`<fn>`, "NA/NaN argument", base::quote("ABBN SW Equity":"ZWM SW Equity"))
 23. rlang (local) h(simpleError(msg, call))
 24. handlers[[1L]](cnd)
Warning messages:
1: In eval_tidy(expr, context_mask) : NAs introduced by coercion
2: In eval_tidy(expr, context_mask) : NAs introduced by coercion

I need to fill in the second condition > -0.5 and then delete the values that are out of this range.我需要填写第二个条件 > -0.5，然后删除超出此范围的值。

Thank you very much in advance for your help and your time!非常感谢您的帮助和时间！

Answer 1

It seems like you are less concerned with an actual subset but rather just switching out unwanted values in your data while preserving what you have for the regression.似乎您不太关心实际的子集，而只是在保留用于回归的内容的同时切换数据中不需要的值。 In that case, the tidyverse package may be helpful.在这种情况下， tidyverse package 可能会有所帮助。 First, you can load this package as well as this fake dataset:首先，您可以加载这个 package 以及这个假数据集：

#### Load Tidyverse ####
library(tidyverse)

#### Make Data Frame ####
data <- data.frame(IV = c("Control","Treatment",
                          "Control","Treatment"),
                   DV = c(-9999,2,4,5555))
data

Which gives you this:这给了你这个：

         IV    DV
1   Control -9999
2 Treatment     2
3   Control     4
4 Treatment  5555

From there you can simply use mutate and ifelse to remove the unwanted values and replace then with NA missing values with this code, saving the data into a new version with the replacement values:从那里您可以简单地使用mutate和ifelse删除不需要的值，然后用此代码替换 NA 缺失值，将数据保存到具有替换值的新版本中：

#### Swap Outliers with NA Values ####
clean.data <- data %>% 
  mutate(DV = ifelse(DV < 0,
                     NA,
                     ifelse(DV > 100,
                            NA,
                            DV)))
clean.data

Which gives you this:这给了你这个：

       IV DV
1   Control NA
2 Treatment  2
3   Control  4
4 Treatment NA

As some others have noted, its generally bad practice to delete outliers in your data unless you have a defensible reason to do so.正如其他一些人所指出的那样，删除数据中的异常值通常是不好的做法，除非您有正当理由这样做。 So if you do remove them, make sure you have something justifiable to include in your thesis that explains why you removed the values.因此，如果您确实删除了它们，请确保您在论文中包含一些合理的内容，以解释您删除这些值的原因。

R：高于和低于基准的值填0

问题描述

1 个解决方案

解决方案1
0 2022-09-27 12:53:01

R：高于和低于基准的值填0

问题描述

1 个解决方案

解决方案1 0 2022-09-27 12:53:01

解决方案1
0 2022-09-27 12:53:01