I am having difficulty figuring out how to convert some wide data into long format. I have three columns of string data ( A1_R00_FillerNP
, A1_R01_ADV
, and A1_R02_1stEmbV
) which I would like to melt into one column ( WordCountRegion
) in such a way that for each Subject and item the correct word will be mapped from one of these three columns to the new, WordCountRegion
column.
Using a simple melt
approach as in the code below gets me part of the way there:
(Note: the strange characters in the df
are inconsequential - please ignore them here)
df <- structure(list(Subject = c(101L, 101L, 101L, 101L, 101L, 101L,
101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L, 101L,
101L), condition = structure(c(2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L,
3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L), .Label = c("P", "R",
"S"), class = "factor"), item = c(101L, 102L, 103L, 101L, 102L,
103L, 101L, 102L, 103L, 101L, 102L, 103L, 101L, 102L, 103L, 101L,
102L, 103L), A1_R00_FillerNP = structure(c(3L, 2L, 1L, 3L, 2L,
1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L), .Label = c("SÌÇna d_r allvarliga konsekvenser",
"SÌÇna d_r fina _ppeltr_d", "SÌÇna d_r gamla skottk_rror"
), class = "factor"), A1_R01_ADV = structure(c(1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("alltid",
"f_rresten"), class = "factor"), A1_R02_1stEmbV = structure(c(3L,
2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 3L, 2L,
1L), .Label = c("diskuterade", "stod", "tv_ttade"), class = "factor"),
RT = c(0L, 149L, 247L, 272L, 171L, 245L, 317L, 0L, 233L,
0L, 981L, 750L, 272L, 171L, 334L, 317L, 0L, 233L), Region = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L), .Label = c("R00", "R01", "R02"), class = "factor"),
RegionType = structure(c(3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L,
1L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("1stEmbV",
"ADV", "FillerNP"), class = "factor"), DV = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), .Label = c("FIRST_FIXATION_DURATION", "GAZE_DURATION"
), class = "factor")), .Names = c("Subject", "condition",
"item", "A1_R00_FillerNP", "A1_R01_ADV", "A1_R02_1stEmbV", "RT",
"Region", "RegionType", "DV"), class = "data.frame", row.names = c(NA,
-18L))
df1 = melt(df, measure.vars = c("A1_R00_FillerNP","A1_R01_ADV","A1_R02_1stEmbV"), var = "WordCountRegion")
The problem is that this code incorrectly breaks the words across regions. I end up with output like the following, where words do not break as specified by Region
and instead extend across values of Region
, as can be seen by WordCountRegion
and value
. It is clear that if I am going to use this, then I need some sort of additional specification so that melt() will be able to break the data correctly. I'm just not sure how to do this (or if it can be done within melt()).
Subject condition item RT Region RegionType DV WordCountRegion value
1 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
2 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
3 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
4 101 R 101 272 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
5 101 P 102 171 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
6 101 S 103 245 R01 ADV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
7 101 R 101 317 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
8 101 P 102 0 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
9 101 S 103 233 R02 1stEmbV FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
10 101 R 101 0 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
11 101 P 102 981 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
12 101 S 103 750 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
13 101 R 101 272 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
14 101 P 102 171 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
15 101 S 103 334 R01 ADV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
16 101 R 101 317 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
17 101 P 102 0 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
18 101 S 103 233 R02 1stEmbV GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
19 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV alltid
20 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV alltid
21 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R01_ADV f_rresten
Is there a way that I could modify melt()
to get these to line up/match by Region
, as in the sample below:
Subject condition item RT Region RegionType DV WordCountRegion value
1 101 R 101 0 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
2 101 P 102 149 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
3 101 S 103 247 R00 FillerNP FIRST_FIXATION_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
4 101 R 101 272 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV alltid
5 101 P 102 171 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV alltid
6 101 S 103 245 R01 ADV FIRST_FIXATION_DURATION A1_R01_ADV f_rresten
7 101 R 101 317 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV tv_ttade
8 101 P 102 0 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV stod
9 101 S 103 233 R02 1stEmbV FIRST_FIXATION_DURATION A1_R02_1stEmbV diskuterade
10 101 R 101 0 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r gamla skottk_rror
11 101 P 102 981 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r fina _ppeltr_d
12 101 S 103 750 R00 FillerNP GAZE_DURATION A1_R00_FillerNP SÌÇna d_r allvarliga konsekvenser
Or, if I am using the wrong function altogether, could someone please point me towards a better solution? Perhaps I need something that does actual lookups?
You could create a little lookup table, merge it in, then use it to filter your melted dataframe, and I believe this gives you the result you're looking for.
region_df <- data.frame(var = c("A1_R00_FillerNP","A1_R01_ADV","A1_R02_1stEmbV"),
Region = c('R00','R01','R02'))
df2 <- merge(df1, region_df)
df3 <- subset(df2, var==WordCountRegion)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.