简体   繁体   中英

Transforming Dataset with only 0 and 1 values

I'm unsure of what to call this, so I'll try to describe in laymens terms what the issue is. I have a dataframe that only consists of 0 and 1. So for each individual instead of having one column with a factoral value (ex. low price, 4 rooms) I have

      V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
1     0  0  0  1  0  0  0  1  0   1   0   0   0   0   1   1   0   0   0   1   0
2     1  0  0  0  0  0  0  1  1   0   0   0   0   0   1   0   0   1   0   0   1
3     0  0  0  1  1  0  0  0  0   0   1   0   0   0   1   1   0   0   1   0   0
4     0  0  0  1  0  1  0  0  0   0   1   0   1   0   0   0   1   0   1   0   0

How can I transform the dataset in R, so that I create new columns (#number of rooms) and give the position of the 1 (in the 4th column) a vhigh value? I have multiple expenatory varibales I need to do this for. the 21 columns are representing 6 variables for 1000+ observations. should be something like this

     PurchaseP.   NumberofRooms ...
1.      vhigh.         4
2.      low.           4
3.      vhigh.         1
4.      vhigh.         2

Just did it for the first 2 epxlenatory varibales here, but essentially it repeats like this with each explenatory variable has 3-4 possible factoral values.

V1:V4 = purchase price, V5:V8 = number of rooms,V9:V11 = floors, and so on

In my head something like this could work

  1. create a if statemt to give each 1 a value depending on column position, ex. if value in V4=1 then name "vhigh". and do this for each Vx
  2. Then combine each column V1:V4, V5:V8, V9:V11 (depending on if it has 3-4 possible factoral/integer values) while ignoring 0 values.

Would this work, or is there a simpler approach? How would one code this in R?

You can use the function which() similar to

lapply(df, function(x) { %now x is a row
    idx = which(x == 1)[1] 
    return(idx)
    })

The interesting part is to use which(x ==1) on each row. This gives you an array of all indices that contain a one. The first of those can be used in your case (assuming that you only have one 1 per line) Otherwise, aggregation needs to be discussed. The resulting column can then be transformed into a factor by giving sensible names to the various indices.

If the dataset contains a single 1 per row this is a pretty simple problem

Here your data according to your picture (please edit your question to put a code instead of picture)

df = data.frame(r1 = 0, r2 = 1, r3 = 0)
rownames(df)<- 1

Then, you simply have to sum your column with the room number as weight

df$room = df$r1*1 + df$r2 * 2 + df$r3 *3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM