[英]Reshape Data Frame Based on Corresponding Column's Identifier R
I'm tried to reshape a two column data frame by collapse the corresponding column values that match in column 2 - in this case ticker symbols to their own unique row while making the contents of column 1 which are the fields of data that correspond to those tickers their own columns. 我试图通过折叠与第2列匹配的相应列值来重塑两列数据框-在这种情况下,将股票代码符号添加到其自己的唯一行,同时使第1列的内容是与之对应的数据字段收录自己的专栏。 See for example a small sample since it's a data frame with 500 tickers and 4 fields:
例如,查看一个小示例,因为它是一个具有500个行情指示器和4个字段的数据框:
test22 Ticker
Current SharePrice $6.57 MFM
Current NAV $7.11 MFM
Current Premium/Discount -7.59% MFM
52WkAvg SharePrice $6.55 MFM
52WkAvg NAV $7.21 MFM
52WkAvg Premium/Discount -9.19% MFM
52WkHigh SharePrice $6.88 MFM
52WkHigh NAV $7.34 MFM
52WkHigh Premium/Discount -5.88% MFM
52WkLow SharePrice $6.05 MFM
52WkLow NAV $7.03 MFM
52WkLow Premium/Discount -14.43% MFM
Current SharePrice $4.84 CXE
Current NAV $5.21 CXE
Current Premium/Discount -7.10% CXE
52WkAvg SharePrice $4.91 CXE
52WkAvg NAV $5.29 CXE
52WkAvg Premium/Discount -7.26% CXE
52WkHigh SharePrice $5.31 CXE
52WkHigh NAV $5.37 CXE
52WkHigh Premium/Discount -1.12% CXE
52WkLow SharePrice $4.58 CXE
52WkLow NAV $5.16 CXE
52WkLow Premium/Discount -11.92% CXE
Ideally, the ticker column after the reformatting transformation is a unique row with the ticker at the row name with, in this case, 12 corresponding columns containing the contents of the "test22" column but their own names at this stage aren't import. 理想情况下,重新格式化转换后的行情栏是唯一的行,行名在行名中,在这种情况下,不包含包含“ test22”列内容的12个相应列,但在此阶段不输入自己的名称。 Help is much appreciated!
非常感谢帮助!
I interpreted this problem as turning this long data into a wide format. 我将这个问题解释为将这些长数据转换为宽格式。 The hardest part of this problem is separating the number from the description.
这个问题最难的部分是将数字与描述分开。 Once that was done it was using the
spread
function to convert to wide. 一旦完成,它就会使用
spread
功能转换为宽。
df<-structure(list(test22 = structure(c(24L, 20L, 22L, 6L, 2L, 4L,
12L, 8L, 10L, 18L, 14L, 16L, 23L, 19L, 21L, 5L, 1L, 3L, 11L,
7L, 9L, 17L, 13L, 15L), .Label = c("52WkAvg NAV $5.29", "52WkAvg NAV $7.21",
"52WkAvg Premium/Discount -7.26%", "52WkAvg Premium/Discount -9.19%",
"52WkAvg SharePrice $4.91", "52WkAvg SharePrice $6.55", "52WkHigh NAV $5.37",
"52WkHigh NAV $7.34", "52WkHigh Premium/Discount -1.12%", "52WkHigh Premium/Discount -5.88%",
"52WkHigh SharePrice $5.31", "52WkHigh SharePrice $6.88", "52WkLow NAV $5.16",
"52WkLow NAV $7.03", "52WkLow Premium/Discount -11.92%", "52WkLow Premium/Discount -14.43%",
"52WkLow SharePrice $4.58", "52WkLow SharePrice $6.05", "Current NAV $5.21",
"Current NAV $7.11", "Current Premium/Discount -7.10%", "Current Premium/Discount -7.59%",
"Current SharePrice $4.84", "Current SharePrice $6.57"), class = "factor"),
Ticker = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("CXE", "MFM"), class = "factor")), class = "data.frame", row.names = c(NA,
-24L))
library(tidyr)
#separate the number for the text
df2<-separate(df, test22, into=c("key", "value"), sep=" (?=[$]*[-\\.0-9]+%*)", extra="merge")
#spread from long to wide
spread(df2, key=key, value=value)
#columns are abridged for clarity
#Ticker 52WkAvg NAV 52WkAvg Premium/Discount 52WkAvg SharePrice 52WkHigh NAV 52WkHigh Premium/Discount 52WkHigh ...
#CXE $5.29 -7.26% $4.91 $5.37 -1.12%
#MFM $7.21 -9.19% $6.55 $7.34 -5.88%
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.