简体   繁体   中英

How to efficiently create a matrix of index and column names combinations in python/pandas

I want to create a dataframe/matrix where each entry is a concatenation of the corresponding index and column names. I can do it with a nested for loop [see below] but was wondering whether there is a better/faster way to do this?

My problem partly comes from just not knowing how to phrase correctly what I am trying to do. I think it is like matrix multiplication, but with strings and concatenation instead of multiplication. [So any comments on how this problem could be better phrased are welcome too]

I found that itertools has a product() function that does somewhat what I want but creates a generator instead of returning a matrix/dataframe.

[Motivation; In biology we have plates with 96 wells for sample storage etc. They have columns 1-12 and rows AH. Some machines return the well code (A3, G12 etc.). I want to generate the codes to map between plates and experiment outputs.]

import pandas as pd

rows = [x for x in "ABCDEFGH"]
columns = [str(x) for x in range(1, 13)]
wells = pd.DataFrame(columns=columns, index=rows)
for i in rows:
    for j in columns:
        wells.loc[i, j] = "".join([i, j])

>wells
Out[37]: 
   1   2   3   4   5   6   7   8   9    10   11   12
A  A1  A2  A3  A4  A5  A6  A7  A8  A9  A10  A11  A12
B  B1  B2  B3  B4  B5  B6  B7  B8  B9  B10  B11  B12
C  C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12
D  D1  D2  D3  D4  D5  D6  D7  D8  D9  D10  D11  D12
E  E1  E2  E3  E4  E5  E6  E7  E8  E9  E10  E11  E12
F  F1  F2  F3  F4  F5  F6  F7  F8  F9  F10  F11  F12
G  G1  G2  G3  G4  G5  G6  G7  G8  G9  G10  G11  G12
H  H1  H2  H3  H4  H5  H6  H7  H8  H9  H10  H11  H12

You can use np.char.add() and broadcasting for this:

rows = [x for x in "ABCDEFGH"]
columns = [str(x) for x in range(1, 13)]
data = np.char.add(np.array(rows)[:, None], np.array(columns)[None, :])
wells = pd.DataFrame(data=data, columns=columns, index=rows)
#     1   2   3   4   5   6   7   8   9   10   11   12
# A  A1  A2  A3  A4  A5  A6  A7  A8  A9  A10  A11  A12
# B  B1  B2  B3  B4  B5  B6  B7  B8  B9  B10  B11  B12
# C  C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12
# D  D1  D2  D3  D4  D5  D6  D7  D8  D9  D10  D11  D12
# E  E1  E2  E3  E4  E5  E6  E7  E8  E9  E10  E11  E12
# F  F1  F2  F3  F4  F5  F6  F7  F8  F9  F10  F11  F12
# G  G1  G2  G3  G4  G5  G6  G7  G8  G9  G10  G11  G12
# H  H1  H2  H3  H4  H5  H6  H7  H8  H9  H10  H11  H12

Why not use pure python

wells = pd.DataFrame([[r+c for c in columns] for r in rows], columns=columns, index=rows)

you can use itertools product, in conjunction with numpy:

from itertools import product

frame = map("".join, product(rows, columns))
frame = np.fromiter(frame, dtype='U4')
frame = frame.reshape(len(rows), -1)
frame = pd.DataFrame(frame, index = rows, columns = columns)
frame
   1   2   3   4   5   6   7   8   9    10   11   12
A  A1  A2  A3  A4  A5  A6  A7  A8  A9  A10  A11  A12
B  B1  B2  B3  B4  B5  B6  B7  B8  B9  B10  B11  B12
C  C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  C11  C12
D  D1  D2  D3  D4  D5  D6  D7  D8  D9  D10  D11  D12
E  E1  E2  E3  E4  E5  E6  E7  E8  E9  E10  E11  E12
F  F1  F2  F3  F4  F5  F6  F7  F8  F9  F10  F11  F12
G  G1  G2  G3  G4  G5  G6  G7  G8  G9  G10  G11  G12
H  H1  H2  H3  H4  H5  H6  H7  H8  H9  H10  H11  H12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM