简体   繁体   中英

Iterate over pairwise combinations of column names and row indices in pandas

If I have the following pandas DataFrame :

>>> df

  x y z

x 1 3 0

y 0 5 0

z 0 3 4

I want to iterate over the pairwise combinations of column names and row indices to perform certain operation. For example, for the pair of x and y , replace the 3 with 'xy'. The desired output will look like:

>>> df

   x  y z

x xx xy xz

y xy yy yz

z xz yz zz

a naïve code that I tried and doesn't work is:

for i, j in range(0,2):
    df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]

How about a simple one-liner, using Pandas DataFrame elements:

df.apply(lambda x: x.index+x.name)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

Update: Using numpy.ufunc.outer method.

pd.DataFrame(np.add.outer(df.index, df.columns), index=df.index, columns=df.columns)

Output:

    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz

df.set_value() is way faster, link to why: Set value for particular cell in pandas DataFrame

import pandas as pd

data = [{'x': 1, 'y': 2, 'z': 3}, {'x': 4, 'y': 5, 'z': 6}, {'x': 7, 'y': 8, 'z': 9}]

df = pd.DataFrame.from_dict(data, orient='columns')

df = df.astype(str)

df

#       x   y   z
#    0  1   2   3
#    1  4   5   6
#    2  7   8   9


for idx, row in df.iterrows():
    for column in list(df.columns.values):
        val = str(idx) + str(column)
        df.set_value(idx, column, val)

df

output:

    x   y   z
0   0x  0y  0z
1   1x  1y  1z
2   2x  2y  2z

Note: set_value won't work if column names are not unique https://github.com/cm3/lafayettedb_thumbnail_getter/issues/3 . You will have to separately fix the non_unique column name problem.

If you don't care about column names you can prepone it with column #

df.columns = [str(idx) + '_' + name for idx, name in enumerate(df.columns)]

This should be really fast:

import numpy as np

grid = np.meshgrid(df.columns.values.astype(str),
                   df.index.values.astype(str))
result = np.core.defchararray.add(*grid)

You can then assign result to either the same dataframe or another one.

Is this what you are looking for?

>>> df
   x  y  z
x  1  3  0
y  0  5  0
z  0  3  4

>>> for i in range(3):
...     for j in range(3):
...         df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]
...
>>> df
    x   y   z
x  xx  xy  xz
y  yx  yy  yz
z  zx  zy  zz
for i, col in enumerate(df.columns):
    print(df[i][col] + df[col][i])


df = pd.DataFrame(df[i][col] + df[col][i] for i, col in enumerate(df.columns))

This way you can iterate over all the columns and paired rows dynamically without needing to know how many columns there are.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM