简体   繁体   中英

pandas row specific apply

Similar to this R question , I'd like to apply a function to each item in a Series (or each row in a DataFrame) using Pandas, but want to use as an argument to this function the index or id of that row. As a trivial example, suppose one wants to create a list of tuples of the form [(index_i, value_i), ..., (index_n, value_n)]. Using a simple Python for loop, I can do:

In [1] L = []
In [2] s = Series(['six', 'seven', 'six', 'seven', 'six'],
           index=['a', 'b', 'c', 'd', 'e'])
In [3] for i, item in enumerate(s):
           L.append((i,item))
In [4] L
Out[4] [(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

But there must be a more efficient way to do this? Perhaps something more Panda-ish like Series.apply? In reality, I'm not worried (in this case) about returning anything meaningful, but more for the efficiency of something like 'apply'. Any ideas?

If you use the apply method with a function what happens is that every item in the Series will be mapped with such a function. Eg

>>> s.apply(enumerate)
a    <enumerate object at 0x13cf910>
b    <enumerate object at 0x13cf870>
c    <enumerate object at 0x13cf820>
d    <enumerate object at 0x13cf7d0>
e    <enumerate object at 0x13ecdc0>

What you want to do is simply to enumerate the series itself.

>>> list(enumerate(s))
[(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

What if for example you wanted to sum the string of all the entities?

>>> ",".join(s)
'six,seven,six,seven,six'

A more complex usage of apply would be this one:

>>> from functools import partial
>>> s.apply(partial(map, lambda x: x*2 ))
a                ['ss', 'ii', 'xx']
b    ['ss', 'ee', 'vv', 'ee', 'nn']
c                ['ss', 'ii', 'xx']
d    ['ss', 'ee', 'vv', 'ee', 'nn']
e                ['ss', 'ii', 'xx']

[Edit]

Following the OP's question for clarifications: Don't confuse Series (1D) with DataFrames (2D) http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe - as I don't really see how you can talk about rows. However you can include indices in your function by creating a new series (apply wont give you any information about the current index):

>>> Series([s[x]+" my index is:  "+x for x in s.keys()], index=s.keys())
a      six index  a
b    seven index  b
c      six index  c
d    seven index  d
e      six index  e

Anyhow I would suggest that you switch to other data types to avoid huge memory leaks.

Here's a neat way, using itertools's count and zip :

import pandas as pd
from itertools import count

s = pd.Series(['six', 'seven', 'six', 'seven', 'six'],
                  index=['a', 'b', 'c', 'd', 'e'])

In [4]: zip(count(), s)
Out[4]: [(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

Unfortunately, only as efficient than enumerate(list(s)) !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM