This might be a simple thing, but I'm new to R and confused. In order to create a matrix in python with the values of a column in a dataset I would just do:
collist = df.columns.tolist()
cols_input = collist[0:178]
X_train = df_train[cols_input].values
x_valid = df_valid[cols_input].values
y_train = df_train['target'].values
y_valid = df_valid['target'].values
Then when I print the shape of these Doing:
print('Training Shape:', x_train.shape, y_train.shape)
I get back (8050, 178) (8050, )
When I try it in R I do this:
x_train <- as.matrix(df_train[, 1:178])
x_val <- as.matrix(df_val[, 1:178])
y_train <- as.matrix(df_train[, 179])
y_val <- as.matrix(df_val[, 179])
dim(x_train)
dim(y_train)
I get this (8049, 8049) and (178, 1)
or I try this:
x_train <- df_train[, 1:178]
x_val <- df_val[, 1:178]
y_train <- df_train[, 179]
y_val <- df_val[, 179]
dim(x_train)
dim(y_train)
I get back 8049 and 178
What am I doing wrong? or what should I be doing?
Thank you
python is 0-based, so when you have 0:178
in R it will be 1:178
. I think you got that correct.
Somehow I don't see how you get 8049 and 178, most likely you read in the data wrongly in R.
so let's use an example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.normal(0,1,(9000,178)))
df['target'] = np.random.uniform(9000)
df_train = df.loc[:8049,]
df_valid = df.loc[8050:,]
collist = df.columns.tolist()
cols_input = collist[0:178]
X_train = df_train[cols_input].values
x_valid = df_valid[cols_input].values
y_train = df_train['target'].values
y_valid = df_train['target'].values
print('Training Shape:', X_train.shape, y_train.shape)
Training Shape: (8050, 178) (8050,)
df.to_csv("df.csv")
Using the df.csv, in R:
df = read.csv("df.csv")
df_train = df[1:8050,]
df_valid = df[8051:nrow(df),]
x_train <- df_train[, 1:178]
x_val <- df_valid[, 1:178]
y_train <- df_train[, 179,drop=FALSE]
y_val <- df_valid[, 179,drop=FALSE]
print(dim(x_train))
[1] 8050 179
print(dim(y_train))
[1] 8050 1
I used a ,drop=FALSE
to keep it as a matrix. If you don't do that, it is a vector that is 8050 long.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.