簡體   English   中英

將一個數據幀中的列合並為新數據幀,然后進行過濾

[英]Combine columns from one data frame into new data frame, and filter

Material    DocDate    Name  Address    Unit    Price
1258486   3/17/2017   FEHLIG BROS BOX    asd     8.95
1258486   5/11/2017   FEHLIG BROS BOX    asd     9.5
1258486   12/11/2017  FEHLIG BROS_BOX    asd     10.5
1250000   12/20/2017  Krones ALPHA       afg     11.5

我有一個上面的數據框。 我需要像下面這樣基於日期(3/17/2017)出現的框架。 所以我需要下面的輸出

Material         Name/address/Unit Price
1258486     FEHLIG BROS BOX/asd/8.95/9.5/10.5
1250000     Krones/ALPHA/afg/11.5

使用data.table您可以嘗試

df <- read.table(stringsAsFactors = FALSE, header = TRUE,
                 text ="Material DocDate Name  Address Unit  Price
                 1258486   3/17/2017  FEHLIG BROS_BOX     asd     8.95
                 1258486   5/11/2017  FEHLIG BROS_BOX     asd     9.5
                 1258486   12/11/2017  FEHLIG BROS_BOX    asd     10.5
                 1250000   12/20/2017  Krones ALPHA       afg     11.5
                 ")
df$DocDate <- as.Date(df$DocDate,'%m/%d/%Y')
library(data.table)
setDT(df)[,.(newVar = paste(Name, Address, Unit, paste(.SD$Price,collapse = "/"), sep = "/") )
          ,by = Material][,.(newVar = newVar[1]), Material]

#returns
   Material                            newVar
1:  1258486 FEHLIG/BROS_BOX/asd/8.95/9.5/10.5
2:  1250000             Krones/ALPHA/afg/11.5

這是使用dplyr的替代方法。 首先是樣本數據:

data <- data.frame(stringsAsFactors=FALSE,
                   Material   = c(1258486L, 1258486L),
                   DocDate    = c("3/17/2017", "5/11/2017"),
                   Name       = c("FEHLIG BROS BOX", "FEHLIG BROS BOX"),
                   Address    = c("asd", "asd"),
                   Unit_Price = c(8.95, 9.5))

然后,這里是獲取您答案的一組步驟。 (順便說一句,我相信,如果有多個共享相同“最早時間”的“ Material行,那么到目前為止提供的所有解決方案都將為您提供多行輸出。您可能希望filter Unit_Price == min(Unit_Price)另一個術語,例如Unit_Price == min(Unit_Price) ,如果有一個在這里有意義的搶七局。)

library(dplyr)
output <- data %>%

  # convert DocDate to a date
  mutate(DocDate = as.Date(DocDate,'%m/%d/%Y')) %>%

  # For each Material...
  group_by(Material) %>% 

  # just keep the line(s) with the first date...
  filter(DocDate == min(DocDate)) %>% ungroup() %>% 

  # and combine fields
  mutate(`Name/address/Unit Price` = paste(Name, Address, Unit_Price, sep = "/")) %>%

  # just the requested columns
  select(Material, `Name/address/Unit Price`)

output
# A tibble: 1 x 2
  Material `Name/address/Unit Price`
     <int> <chr>                    
1  1258486 FEHLIG BROS BOX/asd/8.95 

(編輯:修正了打字錯誤)

根據您對問題的更改進行的完整編輯:

# create example data (notice this differs slightly from your table above)
df <- read.csv(stringsAsFactors = FALSE, header = TRUE,
                 text ="Material, DocDate, Name, Address, UnitPrice
                        1258486, 3/17/2017, FEHLIG BROS BOX, asd, 8.95
                        1258486, 5/11/2017, FEHLIG BROS BOX, asd, 9.50
                        1258486, 12/11/2017, FEHLIG BROS_BOX, asd, 10.5
                        1250000, 12/20/2017, Krones ALPHA, afg, 11.5")

# let's use data.table
library(data.table)
df_orig <- as.data.table(df)
df_orig[ , DocDate := as.Date(DocDate,format="%m/%d/%Y")][order(DocDate)]

# create one string per Name-Material pair
df_intermed <- df_orig[ , .(newvar = paste(Name[1], Address[1], paste(UnitPrice, collapse="/"), sep="/")), by=.(Material, Name)]

# aggregate those strings across Names, so one row per Material
df_final <- df_intermed[ , .(newvar = paste(newvar, collapse=",")), by=Material]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM