简体   繁体   English

从R数据框中删除负值和一个正值

[英]Removing negative values and one positive value from R dataframe

I have a dataframe where one column is the amount spent. 我有一个数据框,其中一栏是花费的金额。 In the amount spent column there are the values for amount spent and also negative values for any returns. 在“花费金额”列中,有“花费金额”的值以及任何收益的负值。 For example. 例如。

ID    Store    Spent
123    A        18.50
123    A       -18.50
123    A        18.50

I want to remove the negative value then one of its positive counter parts - the idea is to only keep fully completed spend amounts so I can look at total spend. 我要先去除负值,再去除其正数对分之一-想法是只保留完全完成的支出金额,这样我才能查看总支出。

Right now I am thinking something like this - where I have the data frame sorted by spend 现在我在想这样的事情-在这里我按照花费对数据框进行了排序

if spend < 0 {
  take absolute value of spend
  if diff between abs(spend) and spend+1 = 0 then both are NA}

I would like to have something like 我想吃点东西

df[df$spend < 0] <- NA

where I can also set one positive counterpart to NA as well. 在这里我也可以为NA设定一个积极的对应对象。 Any suggestions? 有什么建议么?

There should be a simpler solution to this but here is one way. 应该有一个更简单的解决方案,但这是一种方法。 Also created my own example since the one shared did not have sufficient data points to test 还创建了我自己的示例,因为一个共享没有足够的数据点来测试

#Original vector
x <- c(1, 2, -2, 1, -1, -1, 2, 3, -4, 1, 4)
#Count the frequency of negative numbers, keeping all the unique numbers
vals <- table(factor(abs(x[x < 0]), levels = unique(abs(x))))   
#Count the frequency of absolute value of original vector
vals1 <- table(abs(x)) 
#Subtract the frequencies between two vectors
new_val <- vals1 - (vals * 2 )
#Recreate the new vector
as.integer(rep(names(new_val), new_val))
#[1] 1 2 3

If you add a rowid column you can do this with data.table ant-joins. 如果添加rowid列,则可以使用data.table ant-joins进行此操作。

Here's an example which takes ID into account, not deleting "positive counterparts" unless they're the same ID 这是一个考虑ID的示例,除非它们是相同的ID,否则不会删除“正对应项”

First create more interesting sample data 首先创建更多有趣的样本数据

df <- fread('
ID    Store    Spent
123    A        18.50
123    A       -18.50
123    A        18.50
123    A       -19.50
123    A        19.50
123    A       -99.50
124    A       -94.50
124    A        99.50
124    A        94.50
124    A        94.50
')

Now remove all the negative values with positive counterparts, and remove those counterparts 现在,删除带有正对应项的所有负值,并删除那些对应项

negs <- df[Spent < 0][, Spent := -Spent][, rid := rowid(ID, Spent)]
pos <- df[Spent > 0][, rid := rowid(ID, Spent)]
pos[!negs, on = .(ID, Spent, rid), -'rid']
#     ID Store Spent rid
# 1: 123     A  18.5   2
# 2: 124     A  99.5   1
# 3: 124     A  94.5   2

And as applied to Ronak's x vector example 并应用于Ronak的x矢量示例

x <- c(1, 2, -2, 1, -1, -1, 2, 3, -4, 1, 4)
negs <- data.table(x = -x[x<0])[, rid := rowid(x)]
pos <- data.table(x = x[x>0])[, rid := rowid(x)]
pos[!negs, on = names(pos), -'rid']

#    x
# 1: 2
# 2: 3
# 3: 1

I used the following code. 我用下面的代码。

library(dplyr)
store <- rep(LETTERS[1:3], 3)
id <- c(1:4, 1:3, 1:2)
expense <- runif(9, -10, 10)
tibble(store, id, expense) %>%
  group_by(store) %>%
  summarise(net_expenditure = sum(expense))

to get this output: 获得此输出:

# A tibble: 3 x 2
  store net_expenditure
  <chr>           <dbl>
1 A               13.3 
2 B                8.17
3 C               16.6 

Alternatively, if you wanted the net expenditure per store-id pairing, then you could use this code: 或者,如果您希望每个商店ID配对的净支出,则可以使用以下代码:

tibble(store, id, expense) %>%
  group_by(store, id) %>%
  summarise(net_expenditure = sum(expense))

I've approached your question from a slightly different perspective. 我从稍微不同的角度回答了您的问题。 I'm not sure that my code answers your question, but it might help. 我不确定我的代码是否可以回答您的问题,但这可能会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM