[英]Pandas AssertionError when applying function which returns tuple containing list
I am applying a function to a Pandas DataFrame
, and returning a tuple
, to cast into multiple DataFrame
columns using zip(* )
. I am applying a function to a Pandas
DataFrame
, and returning a tuple
, to cast into multiple DataFrame
columns using zip(* )
.
The returned tuple
, contains a list
, containing one or more tuples
.返回的
tuple
包含一个list
,其中包含一个或多个tuples
。
In cases where at least one of the the nested lists
contain a different count of tuples
from the rest of the lists
, everything works fine.如果嵌套
lists
中的至少一个包含与lists
的 rest 不同的tuples
数,则一切正常。
In rare cases where the function returns all nested lists
with equal tuple
counts within, an AssertionError: Shape of new values must be compatible with manager shape
is raised.在 function 返回所有具有相同
tuple
计数的嵌套lists
的极少数情况下,会引发AssertionError: Shape of new values must be compatible with manager shape
。
I suspect Pandas is seeing the consistent nested list
lengths and is trying to unpack the list(tuples)
into separate columns.我怀疑 Pandas 看到一致的嵌套
list
长度,并试图将list(tuples)
解压缩到单独的列中。
How can I force Pandas to always store the returned list
as is, regardless of the conditions above?无论上述条件如何,如何强制 Pandas 始终按原样存储返回的
list
?
(Python 3.7.4, Pandas 1.0.3) (Python 3.7.4、Pandas 1.0.3)
Code that works:有效的代码:
import pandas as pd
import numpy as np
def simple_function(type_count):
calculated_value1 = np.random.randint(5)
calculated_value2 = np.random.randint(5)
types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
return calculated_value1, types_list
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
{'name': 'Beth', 'types': 1},
{'name': 'John', 'types': 1},
{'name': 'Jill', 'types': 2},
], columns=['name', 'types'])
df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))
Code that raises AssertionError: Shape of new values must be compatible with manager shape
:引发
AssertionError: Shape of new values must be compatible with manager shape
:
import pandas as pd
import numpy as np
def simple_function(type_count):
calculated_value1 = np.random.randint(5)
calculated_value2 = np.random.randint(5)
types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
return calculated_value1, types_list
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
{'name': 'Beth', 'types': 1},
{'name': 'John', 'types': 1},
{'name': 'Jill', 'types': 1},
], columns=['name', 'types'])
df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))
By creating a DataFrame from the list on your result:通过从结果列表中创建 DataFrame:
df[['calculated_result','types_list']] = pd.DataFrame(df['types'].apply(simple_function).tolist())
You can get similar result with array您可以使用数组获得类似的结果
df['calculated_result'], df['types_list'] = np.array(df['types'].apply(simple_function).tolist()).T
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.