Apply a function to each column of a dataframe

Question

I have a dataframe of numbers going from 1 to 13 (each number is a location). As the index, I have set a timeline representing timesteps of 2 min during 24h (720 rows). Each column represents a single person. So I have columns of locations along 24h in 2 min timesteps.

I am trying to convert this numbers to binary (if it's a 13, I want a 1, and otherwise a 0). But when I try to apply the function I get an error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here's the code:

import pandas as pd
from datetime import timedelta
df = pd.read_csv("dataset_belgium/all_patterns_2MINS.csv", encoding="utf-8")
df = df.transpose()

df.reset_index(drop=True, inplace=True)


timeline = []
for timestep in range(len(df.index)):
    time = timedelta(seconds=timestep*2*60)
    time = str(time)
    timeline.append(time)


tl = pd.DataFrame(timeline)
tl.columns = ['timeline']

df=df.join(tl, how='left')

df = df.set_index('timeline')
#df.drop(['0:00:00'])

def to_binary(element):
    if element == 13:
        element = 1
    else:
        element = 0
    return element

binary_df = df.apply(to_binary)

Also I would like to eliminate the 1st row, the one of index ('0:00:00'), since it doesn't contain numbers from 1 to 13. Thanks in advance!

Answer 1

As you say in the title, you apply the function to each column of the data frame. So what you call element within the function is actually a whole column. That's why the line if element == 13: raises an error. Python doesn't know what it would mean for a whole column to be equal to one number. One straightforward solution would be to use a for loop:

def to_binary(column):
    for element in column:
        if element == 13:
            element = 1
        else:
            element = 0
    return column

However, this would still not solve the more basic issue that the function doesn't actually change anything with lasting effect, because it uses only local variables.

An easy alternative approach is to use the pandas replace method, which allows you to explicitly replace arbitrary values with other ones:

df.replace([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 
           [0, 0, 0, 0, 0, 0, 0, 0, 0,  0,  0,  0,  1], 
           inplace=True)

To delete the first row, you can use df = df[1:] .

Apply a function to each column of a dataframe

Question

1 answers

solution1
0 ACCPTED 2020-05-11 12:49:00

Apply a function to each column of a dataframe

Question

1 answers

solution1 0 ACCPTED 2020-05-11 12:49:00

solution1
0 ACCPTED 2020-05-11 12:49:00