I have a 3 column dataframe. Let's say my columns are "doc" , "word" , "count" and each row shows number of occurrences of a word in a document.
| doc | word | count |
+-----+------+-------+
| 0 | 0 | 10 |
| 0 | 7 | 2 |
| 0 | 4 | 5 |
| 1 | 2 | 5 |
+-----+------+-------+
I want to convert this dataframe to a matrix having rows as documents and columns as words so I do the following:
matrix = pd.pivot_table(my_df, index="doc", columns="word", values="count", fill_value=0)
What I get is a matrix having columns [0,2,4,7]
. However, what I want is to have another range for my columns, eg range(10): [0,1,2,3,4,5,6,7,8,9]
. This latter will end up some columns having all entries as 0 and this is what I want.
How can I achieve this?
You are asking for reindex
:
matrix = (pd.pivot_table(df, index="doc",
columns="word",
values="count", fill_value=0)
.reindex(range(10), axis=1, fill_value=0)
)
Output:
word 0 1 2 3 4 5 6 7 8 9
doc
0 10 0 0 0 5 0 0 2 0 0
1 0 0 5 0 0 0 0 0 0 0
IIUC, you want to create a sparse matrix document vs words, you could do:
import pandas as pd
from scipy.sparse import csr_matrix
rows, cols, data = zip(*df.to_numpy())
mat = csr_matrix((data, (rows, cols)), shape=(max(rows) + 1, max(cols) + 1))
res = pd.DataFrame(data=mat.toarray())
print(res)
Output
0 1 2 3 4 5 6 7
0 10 0 0 0 5 0 0 2
1 0 0 5 0 0 0 0 0
With this approach the range is determined automatically.
UPDATE
If you want to have 10 columns you could do:
rows, cols, data = zip(*df.to_numpy())
mat = csr_matrix((data, (rows, cols)), shape=(max(rows) + 1, 10))
res = pd.DataFrame(data=mat.toarray())
print(res)
Output
0 1 2 3 4 5 6 7 8 9
0 10 0 0 0 5 0 0 2 0 0
1 0 0 5 0 0 0 0 0 0 0
Simply add the columns that do not exist and fill with 0:
df = pd.pivot_table(my_df, index="doc", columns="word", values="count", fill_value=0)
for c in range(10):
if c not in df.columns:
df[c] = 0
matrix = df[list(range(10))].values
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.