import numpy as np
from scipy.sparse import csr_matrix
csr = csr_matrix(np.array(
[[0, 0, 4],
[1, 0, 0],
[2, 0, 0],]))
# Return a Coordinate (coo) representation of the csr matrix.
coo = csr.tocoo(copy=False)
# Access `row`, `col` and `data` properties of coo matrix.
df = pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data})[['index', 'col', 'data']]
>>> df.head()
index col data
0 0 2 4
1 1 0 1
2 2 0 2
I tried to convert a scipy csr_matrix matrix to a dataframe, where the columns represent the index, column, and data of the matrix.
The only issue is that what I tried above does not produce rows for the columns where the values are 0. Here is what I'd like the output to look like:
>>> df.head()
index col data
0 0 0 0
1 0 1 0
2 0 2 4
3 1 0 1
4 1 1 0
5 1 2 0
6 2 0 2
7 2 1 0
8 2 2 0
You'll see that the code snippet above is taken from this answer in this thread .
My request/question : Is there a way to convert the matrix to a df and also include the elements of the matrix where the value is 0?
One approach is to create a filling
DataFrame and combine it (using combine_first
) with the one you already have:
df = pd.DataFrame({'index': coo.row, 'col': coo.col, 'data': coo.data}).set_index(["index", "col"])
n_rows, n_cols = coo.shape
rows, cols = map(np.ndarray.flatten, np.mgrid[:n_rows, :n_cols])
filling = pd.DataFrame({"index": rows, "col": cols, "data": np.repeat(0, n_rows * n_cols)}) \
.set_index(["index", "col"])
res = df.combine_first(filling).reset_index()
print(res)
Output
index col data
0 0 0 0.0
1 0 1 0.0
2 0 2 4.0
3 1 0 1.0
4 1 1 0.0
5 1 2 0.0
6 2 0 2.0
7 2 1 0.0
8 2 2 0.0
melt
dataframe from 'wide' to 'long' formatdf = your_sparse_matrix_data.todense()
(pd.DataFrame(df)
.melt()
.reset_index()
.rename(columns = {'index':'row','variable':'column'}))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.