简体   繁体   中英

pandas.DataFrame: how to applymap() with external arguments

SEE UPDATE AT THE END FOR A MUCH CLEARER DESCRIPTION.

According to http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.apply.html you can pass external arguments to an apply function, but the same is not true of applymap: http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap

I want to apply an elementwise function f(a, i) , where a is the element, and i is a manually entered argument. The reason I need that is because I will do df.applymap(f) in a loop for i in some_list .

To give an example of what I want, say I have a DataFrame df , where each element is a numpy.ndarray . I want to extract the i -th element of each ndarray and form a new DataFrame from them. So I define my f :

def f(a, i):
    return a[i]

So that I could make a loop which would return the i-th element of each of the np.ndarray contained in df :

for i in some_series:
    b[i] = df.applymap(f, i=i)

so that in each iteration, it would pass my value of i into the function f .

I realise it would all have been easier if I had used MultiIndexing for df but for now, this is what I'm working with. Is there a way to do what I want within pandas? I would ideally like to avoid for-looping through all the columns in df , and I don't see why applymap doesn't take keyword arguments, while apply does.

Also, the way I currently understand it (I may be wrong), when I use df.apply it would give me the i -th element of each row/column, instead of the i -th element of each ndarray contained in df .


UPDATE:

So I just realised I could split df into Series and then use the pd.Series.apply which could do what I want. Let me just generate some data to show what I mean:

def f(a,i):
    return a[i]

b = pd.Series(index=range(10), dtype=object)
for i in b.index:
    b[i] = np.random.rand(5)

b.apply(f,args=(1,))

Does exactly what I expect, and want it to do. However, trying with a DataFrame:

b = pd.DataFrame(index=range(4), columns=range(4), dtype=object)
for i in b.index:
    for col in b.columns:
        b.loc[i,col] = np.random.rand(10)

b.apply(f,args=(1,))

Gives me ValueError: Shape of passed values is (4, 10), indices imply (4, 4) .

You can use it:

def matchValue(value, dictionary):
    return dictionary[value]

a = {'first':  1, 'second':  2}
b = {'first': 10, 'second': 20}
df['column'] = df['column'].map(lambda x: matchValue(x, a))

Pandas applymap doesn't accept arguments, DataFrame.applymap(func) . If you want to maintain an i as state, you can store it as a global variable that's accessed/modified by func , or use a decorator .

However, I would recommend you to try the apply method.

This is a solution where argument is stored within a nested method

f(cell,argument):
    """Do something with cell value and argument"""
    return output

def outer(argument):
   def inner(cell):
        return f(cell,argument)

   return inner 

argument = ...
df.applymap(func = outer(argument))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM