简体   繁体   English

Pandas DataFrame根据列,索引值比较更改值

[英]Pandas DataFrame change a value based on column, index values comparison

Suppose that you have a pandas DataFrame which has some kind of data in the body and numbers in the column and index names. 假设你有一个熊猫DataFrame具有某种在主体数据和数字 columnindex名。

>>> data=np.array([['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'h', 'i']])
>>> columns = [2, 4, 8]
>>> index = [10, 4, 2]
>>> df = pd.DataFrame(data, columns=columns, index=index)
>>> df
    2  4  8
10  a  b  c
4   d  e  f
2   g  h  i

Now suppose we want to manipulate are data frame in some kind of way based on comparing the index and columns. 现在假设我们想要在比较索引和列的基础上以某种方式操作数据帧。 Consider the following. 考虑以下。

Where index is greater than column replace letter with 'k': 其中index大于列替换字母'k':

    2  4  8
10  k  k  k
4   k  e  f
2   g  h  i

Where index is equal to column replace letter with 'U': 其中index等于列替换字母为'U':

    2  4  8
10  k  k  k
4   k  U  f
2   U  h  i

Where column is greater than index replace letter with 'Y': 其中列大于索引替换字母'Y':

    2  4  8
10  k  k  k
4   k  U  Y
2   U  Y  Y

To keep the question useful to all: 为了让问题对所有人有用:

  • What is a fast way to do this replacement? 这种替换的快速方法是什么?

  • What is the simplest way to do this replacement? 这种替换最简单的方法是什么?

Speed Results from minimal example 速度来自最小的例子

  • jezrael : 556 µs ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) jezrael556 µs ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 556μs 556 µs ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 66.1μs 556 µs ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 偏差 556 µs ± 66.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

  • user3471881 : 329 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) user3471881329 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 329μs 329 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 11.4μs 329 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 偏差 329 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

  • thunderwood : 4.65 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) thunderwood4.65 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 252μs 4.65 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 偏差 4.65 ms ± 252 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Is this a duplicate? 这是重复的吗? I searched google for pandas replace compare index column and the top results are: 我搜索了谷歌的pandas replace compare index column ,顶部的结果是:

Pandas - Compare two dataframes and replace values matching condition Pandas - 比较两个数据帧并替换匹配条件的值

Python pandas: replace values based on location not index value Python pandas:根据位置而不是索引值替换值

Pandas DataFrame: replace all values in a column, based on condition Pandas DataFrame:根据条件替换列中的所有值

However, I don't feel any of these touch on whether this a) possible or b) how to compare in such a way 但是,我不觉得这些是否可能或b)如何以这种方式进行比较

I think you need numpy.select with broadcasting: 我认为你需要numpy.select广播:

m1 = df.index.values[:, None] > df.columns.values
m2 = df.index.values[:, None] == df.columns.values


df = pd.DataFrame(np.select([m1, m2], ['k','U'], 'Y'), columns=df.columns, index=df.index)
print (df)
    2  4  8
10  k  k  k
4   k  U  Y
2   U  Y  Y

Performance : 表现

np.random.seed(1000)

N = 1000
a = np.random.randint(100, size=N)
b = np.random.randint(100, size=N)

df = pd.DataFrame(np.random.choice(list('abcdefgh'), size=(N, N)), columns=a, index=b)
#print (df)

def us(df):
    values = np.array(np.array([df.index]).transpose() - np.array([df.columns]), dtype='object')
    greater = values > 0
    less = values < 0
    same = values == 0

    values[greater] = 'k'
    values[less] = 'Y'
    values[same] = 'U'


    return pd.DataFrame(values, columns=df.columns, index=df.index)

def jez(df):

    m1 = df.index.values[:, None] > df.columns.values
    m2 = df.index.values[:, None] == df.columns.values
    return pd.DataFrame(np.select([m1, m2], ['k','U'], 'Y'), columns=df.columns, index=df.index)

In [236]: %timeit us(df)
107 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [237]: %timeit jez(df)
64 ms ± 299 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Not sure about the fastest way to accomplish this but an incredibly simple way would be to just iterate over the dataframe like such: 不确定实现这一目标的最快方法,但一种非常简单的方法就是迭代数据帧,如下所示:

for i in df.index:
    for j in df.columns:
        if i>j:
            df.loc[i,j]='k'
        elif j>i:
            df.loc[i,j]='y'
        else:
            df.loc[i,j]='u'

1. Using np.arrays + np.select : 1.使用np.arrays + np.select

values = np.array(np.array([df.index]).transpose() - np.array([df.columns]))

greater = values > 0
same = values == 0

df = pd.DataFrame(np.select([greater, same], ['k', 'U'], 'Y'), columns=df.columns, index=df.index)

2. Using np.arrays and manual masking. 2.使用np.arrays和手动屏蔽。

values = np.array(np.array([df.index]).transpose() - np.array([df.columns]), dtype='object')

greater = values > 0
less = values < 0
same = values == 0

values[greater] = 'k'
values[less] = 'Y'
values[same] = 'U'


df = pd.DataFrame(values, columns=df.columns, index=df.index)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用pandas数据框的列值更改numpy数组的索引值 - how to change the index value of numpy array with column values of pandas dataframe 比较来自相同 pandas dataframe 的 2 列的值和基于比较的第 3 列的返回值 - comparing values of 2 columns from same pandas dataframe & returning value of 3rd column based on comparison 根据熊猫中的另一个数据框更改列中的值 - Change values in column based on antoher dataframe in pandas Pandas dataframe 根据条件更改列中的值 - Pandas dataframe change values in a column based on conditions pandas dataframe - 根据列标题更改值 - pandas dataframe - change values based on column heading 如何基于列值比较在python中过滤Pandas数据框? - How to filter a Pandas dataframe in python based on column value comparison? Pandas - 根据与数据框中某个值匹配的系列索引,将系列中的值添加到数据框列 - Pandas - Add values from series to dataframe column based on index of series matching some value in dataframe 根据另一列熊猫数据框的值更改一列的值 - Change values of one column based on values of other column pandas dataframe 根据索引使Pandas Dataframe列等于另一个Dataframe中的值 - Make Pandas Dataframe column equal to value in another Dataframe based on index 根据 dataframe 中的其他列更改 pandas dataframe 列值 - Change pandas dataframe column values based on other columns in dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM