[英]How to convert Dataframe with arrays in rows to a numpy matrix?
I have a CSV file where there is an array in each row. 我有一个CSV文件,其中每行都有一个数组。 I would like to convert the row contents to columns ie a Matrix at the end (since I have multiple rows). 我想将行内容转换为列,即最后的矩阵(因为我有多行)。 I can do it using a for loop and csv.reader - but it's quite slow. 我可以使用for循环和csv.reader来做到这一点-但速度很慢。 So, I had an idea that Pandas would be faster, and that I could do the conversion without the need for a loop. 因此,我想到了Pandas会更快,并且可以进行转换而无需循环。 I read the file and get a Datframe type of Size (200,1) - where each row contains 700 floats that are comma separated, eg [0.4, 0.5, 0.3, ....] 我阅读了文件,并得到一个大小为(200,1)的Datframe类型-其中每行包含700个以逗号分隔的浮点,例如[0.4、0.5、0.3,...]
If I do a .value on the output I just get it converted to an Object Type - still not usable... 如果我在输出上执行.value,我只是将其转换为对象类型-仍然无法使用...
I just can't figure out how to convert this data into a Matrix... 我只是不知道如何将这些数据转换为矩阵...
Am I looking in the wrong direction here? 我在这里看错了方向吗?
ranges = pd.read_csv(name,usecols=['ranges'])
What does work is this: 起作用的是:
X = open(name)
csv_X=csv.reader(X)
ranges = []next(csv_X)#jump over the first row in the csv
for row in csv_X:
ranges.append(ast.literal_eval(row[14]))
X.close()
But that is just really slow. 但这真的很慢。 So, my idea about using Pandas is to speed this up. 因此,我对使用Pandas的想法是加快这一过程。
With dataset looking like this: 数据集如下所示:
range
0 [5, 5, 7, 5, 7, 2, 0, 4, 1, 6]
1 [1, 0, 6, 1, 1, 5, 7, 8, 6, 7]
2 [2, 0, 4, 6, 6, 6, 5, 1, 6, 5]
3 [5, 5, 2, 7, 1, 8, 7, 2, 8, 4]
4 [1, 5, 6, 6, 8, 2, 6, 6, 3, 1]
You can try: 你可以试试:
pd.DataFrame(np.vstack(df.range.values))
which yields: 产生:
0 1 2 3 4 5 6 7 8 9
0 5 5 7 5 7 2 0 4 1 6
1 1 0 6 1 1 5 7 8 6 7
2 2 0 4 6 6 6 5 1 6 5
3 5 5 2 7 1 8 7 2 8 4
4 1 5 6 6 8 2 6 6 3 1
Editted Editted
If your rows are strings such as: 如果您的行是字符串,例如:
ranges
0 8,9,7,6,3,2,4,1,8,3
1 7,9,9,2,1,6,4,1,8,2
2 9,3,0,9,7,7,0,9,9,6
3 0,7,1,0,5,5,1,2,4,2
4 3,3,8,0,8,7,3,6,6,2
5 9,3,7,6,5,7,8,3,8,7
6 1,6,7,8,5,6,7,0,7,8
7 5,5,0,9,2,1,5,4,3,4
8 3,8,9,8,6,3,8,5,9,8
9 8,5,1,7,1,4,8,1,6,4
Try: 尝试:
pd.DataFrame(df.ranges.str.split(',').tolist())
which yields: 产生:
0 1 2 3 4 5 6 7 8 9
0 8 9 7 6 3 2 4 1 8 3
1 7 9 9 2 1 6 4 1 8 2
2 9 3 0 9 7 7 0 9 9 6
3 0 7 1 0 5 5 1 2 4 2
4 3 3 8 0 8 7 3 6 6 2
5 9 3 7 6 5 7 8 3 8 7
6 1 6 7 8 5 6 7 0 7 8
7 5 5 0 9 2 1 5 4 3 4
8 3 8 9 8 6 3 8 5 9 8
9 8 5 1 7 1 4 8 1 6 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.