[英]how to apply a function to each row in a dataframe and get a series of dicts?
我想将一个函数应用于数据框中的每一行并获得一个序列。 如果该函数返回一个数字,它将起作用,但是当该函数返回一个dict时,它将不起作用。
In [31]: d
Out[31]:
a b
bar one -0.185677 -0.554356
two -0.457943 -1.094836
baz one -0.731338 -0.027821
two -1.061098 0.258291
foo one -1.392160 2.287989
two 2.010208 -1.350581
qux one -0.792229 -0.323397
two -1.063265 0.048641
In [32]: d.apply(lambda x: x.a+x.b, axis=1)
Out[32]:
bar one -0.740034
two -1.552779
baz one -0.759159
two -0.802806
foo one 0.895829
two 0.659626
qux one -1.115627
two -1.014624
dtype: float64
In [33]: d.apply(lambda x: {"boo": x.a}, axis=1)// I want a series of dict
Out[33]:
a b
bar one NaN NaN
two NaN NaN
baz one NaN NaN
two NaN NaN
foo one NaN NaN
two NaN NaN
qux one NaN NaN
two NaN NaN
apply的reduce参数是否为None / True / False似乎无关紧要。 通过在字典{“ boo”:xa}中访问键“ a”和“ b”的值,熊猫似乎太聪明了。
答案是不要尝试一步执行计算和数据类型转换。 计算一系列值,然后将其重组为字典。
import pandas
import random
mi1 = ['bar','baz']
mi2 = ['one', 'two']
data = [[s,t, random.random(), random.random()] for s in mi1 for t in mi2]
df = pandas.DataFrame(data, columns=['i1', 'i2', 'a', 'b'])
df.set_index(['i1', 'i2'], inplace=True)
print(df)
出
a b
i1 i2
bar one 0.438596 0.734058
two 0.183522 0.272922
baz one 0.581694 0.522173
two 0.776081 0.941120
在
# calculate the series
# the data type is still floats
sr = df.apply(lambda row: row.a + row.b, axis=1)
# construct the series of objects
result = sr.apply(lambda x: {"boo":x})
print(result)
出
i1 i2
bar one {'boo': 1.17265404527}
two {'boo': 0.456443829892}
baz one {'boo': 1.10386719117}
two {'boo': 1.71720149706}
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.