简体   繁体   English

如何将具有行数组的数据框转换为numpy矩阵?

[英]How to convert Dataframe with arrays in rows to a numpy matrix?

I have a CSV file where there is an array in each row. 我有一个CSV文件,其中每行都有一个数组。 I would like to convert the row contents to columns ie a Matrix at the end (since I have multiple rows). 我想将行内容转换为列,即最后的矩阵(因为我有多行)。 I can do it using a for loop and csv.reader - but it's quite slow. 我可以使用for循环和csv.reader来做到这一点-但速度很慢。 So, I had an idea that Pandas would be faster, and that I could do the conversion without the need for a loop. 因此,我想到了Pandas会更快,并且可以进行转换而无需循环。 I read the file and get a Datframe type of Size (200,1) - where each row contains 700 floats that are comma separated, eg [0.4, 0.5, 0.3, ....] 我阅读了文件,并得到一个大小为(200,1)的Datframe类型-其中每行包含700个以逗号分隔的浮点,例如[0.4、0.5、0.3,...]

If I do a .value on the output I just get it converted to an Object Type - still not usable... 如果我在输出上执行.value,我只是将其转换为对象类型-仍然无法使用...

I just can't figure out how to convert this data into a Matrix... 我只是不知道如何将这些数据转换为矩阵...

Am I looking in the wrong direction here? 我在这里看错了方向吗?

ranges = pd.read_csv(name,usecols=['ranges'])

What does work is this: 起作用的是:

X = open(name)
csv_X=csv.reader(X)
ranges = []next(csv_X)#jump over the first row in the csv
for row in csv_X:
    ranges.append(ast.literal_eval(row[14]))
X.close()

But that is just really slow. 但这真的很慢。 So, my idea about using Pandas is to speed this up. 因此,我对使用Pandas的想法是加快这一过程。

With dataset looking like this: 数据集如下所示:

                            range
0  [5, 5, 7, 5, 7, 2, 0, 4, 1, 6]
1  [1, 0, 6, 1, 1, 5, 7, 8, 6, 7]
2  [2, 0, 4, 6, 6, 6, 5, 1, 6, 5]
3  [5, 5, 2, 7, 1, 8, 7, 2, 8, 4]
4  [1, 5, 6, 6, 8, 2, 6, 6, 3, 1]

You can try: 你可以试试:

pd.DataFrame(np.vstack(df.range.values))

which yields: 产生:

   0  1  2  3  4  5  6  7  8  9
0  5  5  7  5  7  2  0  4  1  6
1  1  0  6  1  1  5  7  8  6  7
2  2  0  4  6  6  6  5  1  6  5
3  5  5  2  7  1  8  7  2  8  4
4  1  5  6  6  8  2  6  6  3  1

Editted Editted

If your rows are strings such as: 如果您的行是字符串,例如:

                ranges
0  8,9,7,6,3,2,4,1,8,3
1  7,9,9,2,1,6,4,1,8,2
2  9,3,0,9,7,7,0,9,9,6
3  0,7,1,0,5,5,1,2,4,2
4  3,3,8,0,8,7,3,6,6,2
5  9,3,7,6,5,7,8,3,8,7
6  1,6,7,8,5,6,7,0,7,8
7  5,5,0,9,2,1,5,4,3,4
8  3,8,9,8,6,3,8,5,9,8
9  8,5,1,7,1,4,8,1,6,4

Try: 尝试:

pd.DataFrame(df.ranges.str.split(',').tolist())

which yields: 产生:

   0  1  2  3  4  5  6  7  8  9
0  8  9  7  6  3  2  4  1  8  3
1  7  9  9  2  1  6  4  1  8  2
2  9  3  0  9  7  7  0  9  9  6
3  0  7  1  0  5  5  1  2  4  2
4  3  3  8  0  8  7  3  6  6  2
5  9  3  7  6  5  7  8  3  8  7
6  1  6  7  8  5  6  7  0  7  8
7  5  5  0  9  2  1  5  4  3  4
8  3  8  9  8  6  3  8  5  9  8
9  8  5  1  7  1  4  8  1  6  4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM