简体   繁体   中英

Pandas AssertionError when applying function which returns tuple containing list

I am applying a function to a Pandas DataFrame , and returning a tuple , to cast into multiple DataFrame columns using zip(* ) .

The returned tuple , contains a list , containing one or more tuples .

In cases where at least one of the the nested lists contain a different count of tuples from the rest of the lists , everything works fine.

In rare cases where the function returns all nested lists with equal tuple counts within, an AssertionError: Shape of new values must be compatible with manager shape is raised.

I suspect Pandas is seeing the consistent nested list lengths and is trying to unpack the list(tuples) into separate columns.

How can I force Pandas to always store the returned list as is, regardless of the conditions above?


(Python 3.7.4, Pandas 1.0.3)

Code that works:

import pandas as pd
import numpy as np

def simple_function(type_count):
    calculated_value1 = np.random.randint(5)
    calculated_value2 = np.random.randint(5)
    types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
    return calculated_value1, types_list
    
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
                   {'name': 'Beth', 'types': 1},
                   {'name': 'John', 'types': 1},
                   {'name': 'Jill', 'types': 2},
                   ], columns=['name', 'types'])

df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))

Code that raises AssertionError: Shape of new values must be compatible with manager shape :

import pandas as pd
import numpy as np

def simple_function(type_count):
    calculated_value1 = np.random.randint(5)
    calculated_value2 = np.random.randint(5)
    types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
    return calculated_value1, types_list
    
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
                   {'name': 'Beth', 'types': 1},
                   {'name': 'John', 'types': 1},
                   {'name': 'Jill', 'types': 1},
                   ], columns=['name', 'types'])

df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))

By creating a DataFrame from the list on your result:

df[['calculated_result','types_list']] = pd.DataFrame(df['types'].apply(simple_function).tolist())

You can get similar result with array

df['calculated_result'], df['types_list'] = np.array(df['types'].apply(simple_function).tolist()).T

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM