[英]Combine columns from one data frame into new data frame, and filter
Material DocDate Name Address Unit Price
1258486 3/17/2017 FEHLIG BROS BOX asd 8.95
1258486 5/11/2017 FEHLIG BROS BOX asd 9.5
1258486 12/11/2017 FEHLIG BROS_BOX asd 10.5
1250000 12/20/2017 Krones ALPHA afg 11.5
我有一個上面的數據框。 我需要像下面這樣基於日期(3/17/2017)出現的框架。 所以我需要下面的輸出
Material Name/address/Unit Price
1258486 FEHLIG BROS BOX/asd/8.95/9.5/10.5
1250000 Krones/ALPHA/afg/11.5
使用data.table
您可以嘗試
df <- read.table(stringsAsFactors = FALSE, header = TRUE,
text ="Material DocDate Name Address Unit Price
1258486 3/17/2017 FEHLIG BROS_BOX asd 8.95
1258486 5/11/2017 FEHLIG BROS_BOX asd 9.5
1258486 12/11/2017 FEHLIG BROS_BOX asd 10.5
1250000 12/20/2017 Krones ALPHA afg 11.5
")
df$DocDate <- as.Date(df$DocDate,'%m/%d/%Y')
library(data.table)
setDT(df)[,.(newVar = paste(Name, Address, Unit, paste(.SD$Price,collapse = "/"), sep = "/") )
,by = Material][,.(newVar = newVar[1]), Material]
#returns
Material newVar
1: 1258486 FEHLIG/BROS_BOX/asd/8.95/9.5/10.5
2: 1250000 Krones/ALPHA/afg/11.5
這是使用dplyr
的替代方法。 首先是樣本數據:
data <- data.frame(stringsAsFactors=FALSE,
Material = c(1258486L, 1258486L),
DocDate = c("3/17/2017", "5/11/2017"),
Name = c("FEHLIG BROS BOX", "FEHLIG BROS BOX"),
Address = c("asd", "asd"),
Unit_Price = c(8.95, 9.5))
然后,這里是獲取您答案的一組步驟。 (順便說一句,我相信,如果有多個共享相同“最早時間”的“ Material
行,那么到目前為止提供的所有解決方案都將為您提供多行輸出。您可能希望filter
Unit_Price == min(Unit_Price)
另一個術語,例如Unit_Price == min(Unit_Price)
,如果有一個在這里有意義的搶七局。)
library(dplyr)
output <- data %>%
# convert DocDate to a date
mutate(DocDate = as.Date(DocDate,'%m/%d/%Y')) %>%
# For each Material...
group_by(Material) %>%
# just keep the line(s) with the first date...
filter(DocDate == min(DocDate)) %>% ungroup() %>%
# and combine fields
mutate(`Name/address/Unit Price` = paste(Name, Address, Unit_Price, sep = "/")) %>%
# just the requested columns
select(Material, `Name/address/Unit Price`)
output
# A tibble: 1 x 2
Material `Name/address/Unit Price`
<int> <chr>
1 1258486 FEHLIG BROS BOX/asd/8.95
(編輯:修正了打字錯誤)
根據您對問題的更改進行的完整編輯:
# create example data (notice this differs slightly from your table above)
df <- read.csv(stringsAsFactors = FALSE, header = TRUE,
text ="Material, DocDate, Name, Address, UnitPrice
1258486, 3/17/2017, FEHLIG BROS BOX, asd, 8.95
1258486, 5/11/2017, FEHLIG BROS BOX, asd, 9.50
1258486, 12/11/2017, FEHLIG BROS_BOX, asd, 10.5
1250000, 12/20/2017, Krones ALPHA, afg, 11.5")
# let's use data.table
library(data.table)
df_orig <- as.data.table(df)
df_orig[ , DocDate := as.Date(DocDate,format="%m/%d/%Y")][order(DocDate)]
# create one string per Name-Material pair
df_intermed <- df_orig[ , .(newvar = paste(Name[1], Address[1], paste(UnitPrice, collapse="/"), sep="/")), by=.(Material, Name)]
# aggregate those strings across Names, so one row per Material
df_final <- df_intermed[ , .(newvar = paste(newvar, collapse=",")), by=Material]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.