将数据读取到numpy数组

Question

I have a file below 我有一个文件在下面

label,feature
0,70 80 90 50 33 58 ...
2,53 56 84 56 25 12 ...
1,32 56 84 89 65 87 ...
...
2,56 48 57 56 99 22 ...
4,25 65 84 54 54 15 ...

I want the data could be 我希望数据可以

Ytrain = [0,2,1,...2,4]  (int, ndarray)
Xtrain = [[70 80 90 50 33 58...],
          [53 56 80 56 25 12...],
          ...
          [25 65 84 54 54 15...]] (int, ndarray)

here is my code 这是我的代码

data = pd.read_csv('train.csv')
Ytrain = np.array(data.iloc[:, 0]).astype(int)
train = np.array(data.iloc[:, 1:]).astype(str)

Xtrain = []
for i in range(len(train)):
    tmp = [int(x) for x in train[i][0].split()]
    Xtrain.append(tmp)
Xtrain = np.array(Xtrain)

do you have a better way to do that ? 你有更好的方法吗？

Answer 1

Add multiple separator to read_csv with header=None and skiprows=1 for not read csv header: 使用header=None将多个分隔符添加到read_csv ，并且对于未读取csv头，将skiprows=1 ：

data = pd.read_csv('train.csv', sep="[,\s+]", header=None, skiprows=1, engine='python')
print (data)
   0   1   2   3   4   5   6
0  0  70  80  90  50  33  58
1  2  53  56  84  56  25  12
2  1  32  56  84  89  65  87
3  2  56  48  57  56  99  22
4  4  25  65  84  54  54  15

Last select by iloc : 最后由iloc选择：

Ytrain = data.iloc[:,0].values
Xtrain = data.iloc[:,1:].values

Or use split with expand=True for DataFrame : 或者使用split与expand=True为DataFrame ：

data = pd.read_csv('train.csv')
Ytrain = data.iloc[:,0].values.astype(int)
Xtrain = data.iloc[:,1].str.split(expand=True).values.astype(int)

print (Ytrain)
[0 2 1 2 4]

print (Xtrain)
[[70 80 90 50 33 58]
 [53 56 84 56 25 12]
 [32 56 84 89 65 87]
 [56 48 57 56 99 22]
 [25 65 84 54 54 15]]

Answer 2

You can use numpy for this. 你可以使用numpy 。 Since you have multiple delimiters, a little more work is required. 由于您有多个分隔符，因此需要做更多的工作。

import numpy as np

s = open('train.csv', 'r').read().replace(',', ' ')
arr = np.genfromtxt(s)

Ytrain = arr[:, 1]
Xtrain = arr[:, 1:]

将数据读取到numpy数组

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-02-11 11:58:13

解决方案2
0 2018-02-11 12:00:32

将数据读取到numpy数组

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-02-11 11:58:13

解决方案2 0 2018-02-11 12:00:32

解决方案1
1 已采纳 2018-02-11 11:58:13

解决方案2
0 2018-02-11 12:00:32