If I have the following pandas
DataFrame
:
>>> df
x y z
x 1 3 0
y 0 5 0
z 0 3 4
I want to iterate over the pairwise combinations of column names and row indices to perform certain operation. For example, for the pair of x
and y
, replace the 3 with 'xy'. The desired output will look like:
>>> df
x y z
x xx xy xz
y xy yy yz
z xz yz zz
a naïve code that I tried and doesn't work is:
for i, j in range(0,2):
df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]
How about a simple one-liner, using Pandas DataFrame elements:
df.apply(lambda x: x.index+x.name)
Output:
x y z
x xx xy xz
y yx yy yz
z zx zy zz
pd.DataFrame(np.add.outer(df.index, df.columns), index=df.index, columns=df.columns)
Output:
x y z
x xx xy xz
y yx yy yz
z zx zy zz
df.set_value()
is way faster, link to why: Set value for particular cell in pandas DataFrame
import pandas as pd
data = [{'x': 1, 'y': 2, 'z': 3}, {'x': 4, 'y': 5, 'z': 6}, {'x': 7, 'y': 8, 'z': 9}]
df = pd.DataFrame.from_dict(data, orient='columns')
df = df.astype(str)
df
# x y z
# 0 1 2 3
# 1 4 5 6
# 2 7 8 9
for idx, row in df.iterrows():
for column in list(df.columns.values):
val = str(idx) + str(column)
df.set_value(idx, column, val)
df
output:
x y z
0 0x 0y 0z
1 1x 1y 1z
2 2x 2y 2z
Note: set_value won't work if column names are not unique https://github.com/cm3/lafayettedb_thumbnail_getter/issues/3 . You will have to separately fix the non_unique column name problem.
If you don't care about column names you can prepone it with column #
df.columns = [str(idx) + '_' + name for idx, name in enumerate(df.columns)]
This should be really fast:
import numpy as np
grid = np.meshgrid(df.columns.values.astype(str),
df.index.values.astype(str))
result = np.core.defchararray.add(*grid)
You can then assign result
to either the same dataframe or another one.
Is this what you are looking for?
>>> df
x y z
x 1 3 0
y 0 5 0
z 0 3 4
>>> for i in range(3):
... for j in range(3):
... df.loc[df.index[i], df.columns[j]] = df.index[i] + df.columns[j]
...
>>> df
x y z
x xx xy xz
y yx yy yz
z zx zy zz
for i, col in enumerate(df.columns):
print(df[i][col] + df[col][i])
df = pd.DataFrame(df[i][col] + df[col][i] for i, col in enumerate(df.columns))
This way you can iterate over all the columns and paired rows dynamically without needing to know how many columns there are.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.