pandas row specific apply

Question

Similar to this R question , I'd like to apply a function to each item in a Series (or each row in a DataFrame) using Pandas, but want to use as an argument to this function the index or id of that row. As a trivial example, suppose one wants to create a list of tuples of the form [(index_i, value_i), ..., (index_n, value_n)]. Using a simple Python for loop, I can do:

In [1] L = []
In [2] s = Series(['six', 'seven', 'six', 'seven', 'six'],
           index=['a', 'b', 'c', 'd', 'e'])
In [3] for i, item in enumerate(s):
           L.append((i,item))
In [4] L
Out[4] [(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

But there must be a more efficient way to do this? Perhaps something more Panda-ish like Series.apply? In reality, I'm not worried (in this case) about returning anything meaningful, but more for the efficiency of something like 'apply'. Any ideas?

Answer 1

If you use the apply method with a function what happens is that every item in the Series will be mapped with such a function. Eg

>>> s.apply(enumerate)
a    <enumerate object at 0x13cf910>
b    <enumerate object at 0x13cf870>
c    <enumerate object at 0x13cf820>
d    <enumerate object at 0x13cf7d0>
e    <enumerate object at 0x13ecdc0>

What you want to do is simply to enumerate the series itself.

>>> list(enumerate(s))
[(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

What if for example you wanted to sum the string of all the entities?

>>> ",".join(s)
'six,seven,six,seven,six'

A more complex usage of apply would be this one:

>>> from functools import partial
>>> s.apply(partial(map, lambda x: x*2 ))
a                ['ss', 'ii', 'xx']
b    ['ss', 'ee', 'vv', 'ee', 'nn']
c                ['ss', 'ii', 'xx']
d    ['ss', 'ee', 'vv', 'ee', 'nn']
e                ['ss', 'ii', 'xx']

[Edit]

Following the OP's question for clarifications: Don't confuse Series (1D) with DataFrames (2D) http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe - as I don't really see how you can talk about rows. However you can include indices in your function by creating a new series (apply wont give you any information about the current index):

>>> Series([s[x]+" my index is:  "+x for x in s.keys()], index=s.keys())
a      six index  a
b    seven index  b
c      six index  c
d    seven index  d
e      six index  e

Anyhow I would suggest that you switch to other data types to avoid huge memory leaks.

Answer 2

Here's a neat way, using itertools's count and zip :

import pandas as pd
from itertools import count

s = pd.Series(['six', 'seven', 'six', 'seven', 'six'],
                  index=['a', 'b', 'c', 'd', 'e'])

In [4]: zip(count(), s)
Out[4]: [(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

Unfortunately, only as efficient than enumerate(list(s)) !

pandas row specific apply

Question

2 answers

solution1
7 ACCPTED 2012-06-23 16:00:18

solution2
3 2012-12-11 20:47:51

pandas row specific apply

Question

2 answers

solution1 7 ACCPTED 2012-06-23 16:00:18

solution2 3 2012-12-11 20:47:51

solution1
7 ACCPTED 2012-06-23 16:00:18

solution2
3 2012-12-11 20:47:51