[英]read data to numpy array
I have a file below 我有一个文件在下面
label,feature
0,70 80 90 50 33 58 ...
2,53 56 84 56 25 12 ...
1,32 56 84 89 65 87 ...
...
2,56 48 57 56 99 22 ...
4,25 65 84 54 54 15 ...
I want the data could be 我希望数据可以
Ytrain = [0,2,1,...2,4] (int, ndarray)
Xtrain = [[70 80 90 50 33 58...],
[53 56 80 56 25 12...],
...
[25 65 84 54 54 15...]] (int, ndarray)
here is my code 这是我的代码
data = pd.read_csv('train.csv')
Ytrain = np.array(data.iloc[:, 0]).astype(int)
train = np.array(data.iloc[:, 1:]).astype(str)
Xtrain = []
for i in range(len(train)):
tmp = [int(x) for x in train[i][0].split()]
Xtrain.append(tmp)
Xtrain = np.array(Xtrain)
do you have a better way to do that ? 你有更好的方法吗?
Add multiple separator to read_csv
with header=None
and skiprows=1
for not read csv header: 使用
header=None
将多个分隔符添加到read_csv
,并且对于未读取csv头,将skiprows=1
:
data = pd.read_csv('train.csv', sep="[,\s+]", header=None, skiprows=1, engine='python')
print (data)
0 1 2 3 4 5 6
0 0 70 80 90 50 33 58
1 2 53 56 84 56 25 12
2 1 32 56 84 89 65 87
3 2 56 48 57 56 99 22
4 4 25 65 84 54 54 15
Last select by iloc
: 最后由
iloc
选择:
Ytrain = data.iloc[:,0].values
Xtrain = data.iloc[:,1:].values
Or use split
with expand=True
for DataFrame
: 或者使用
split
与expand=True
为DataFrame
:
data = pd.read_csv('train.csv')
Ytrain = data.iloc[:,0].values.astype(int)
Xtrain = data.iloc[:,1].str.split(expand=True).values.astype(int)
print (Ytrain)
[0 2 1 2 4]
print (Xtrain)
[[70 80 90 50 33 58]
[53 56 84 56 25 12]
[32 56 84 89 65 87]
[56 48 57 56 99 22]
[25 65 84 54 54 15]]
You can use numpy
for this. 你可以使用
numpy
。 Since you have multiple delimiters, a little more work is required. 由于您有多个分隔符,因此需要做更多的工作。
import numpy as np
s = open('train.csv', 'r').read().replace(',', ' ')
arr = np.genfromtxt(s)
Ytrain = arr[:, 1]
Xtrain = arr[:, 1:]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.