简体   繁体   中英

Is there a way to create a Matrix utilizing values (example from python code) from columns in a data frame in R?

This might be a simple thing, but I'm new to R and confused. In order to create a matrix in python with the values of a column in a dataset I would just do:

collist = df.columns.tolist()
cols_input = collist[0:178]

X_train = df_train[cols_input].values
x_valid = df_valid[cols_input].values
y_train = df_train['target'].values
y_valid = df_valid['target'].values

Then when I print the shape of these Doing:

print('Training Shape:', x_train.shape, y_train.shape)

I get back (8050, 178) (8050, )

When I try it in R I do this:

x_train <- as.matrix(df_train[, 1:178])
x_val <- as.matrix(df_val[, 1:178])
y_train <- as.matrix(df_train[, 179])
y_val <- as.matrix(df_val[, 179])

dim(x_train)
dim(y_train)

I get this (8049, 8049) and (178, 1)

or I try this:

x_train <- df_train[, 1:178]
x_val <- df_val[, 1:178]
y_train <- df_train[, 179]
y_val <- df_val[, 179]
dim(x_train)
dim(y_train)

I get back 8049 and 178

What am I doing wrong? or what should I be doing?

Thank you

python is 0-based, so when you have 0:178 in R it will be 1:178 . I think you got that correct.

Somehow I don't see how you get 8049 and 178, most likely you read in the data wrongly in R.

so let's use an example:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.normal(0,1,(9000,178)))
df['target'] = np.random.uniform(9000)
df_train = df.loc[:8049,]
df_valid = df.loc[8050:,]

collist = df.columns.tolist()
cols_input = collist[0:178]

X_train = df_train[cols_input].values
x_valid = df_valid[cols_input].values
y_train = df_train['target'].values
y_valid = df_train['target'].values

print('Training Shape:', X_train.shape, y_train.shape)
Training Shape: (8050, 178) (8050,)

df.to_csv("df.csv")

Using the df.csv, in R:

df = read.csv("df.csv")
df_train = df[1:8050,]
df_valid = df[8051:nrow(df),]

x_train <- df_train[, 1:178]
x_val <- df_valid[, 1:178]
y_train <- df_train[, 179,drop=FALSE]
y_val <- df_valid[, 179,drop=FALSE]

print(dim(x_train))
[1] 8050  179
print(dim(y_train))
[1] 8050    1

I used a ,drop=FALSE to keep it as a matrix. If you don't do that, it is a vector that is 8050 long.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM