[英]pandas pivot table with values from two non overlapping columns
我有以下dataframe
:
df = pd.DataFrame({'asset_number': [100, 100, 100, 1001, 1001, 1001, 1015, 1015, 1015],
'feature_name': ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
'value_string': [None, 'xxxx', None, None, 'yyyy', None, None, 'zzzz', None],
'value_float': [42.0, None, 2.25, 42.0, None, 2.25, 37.0, None, 2.75]}
)
+--------------+--------------+-----------------+-------------+
| asset_number | feature_name | value_string | value_float |
+--------------+--------------+-----------------+-------------+
| 100 | a | | 42 |
+--------------+--------------+-----------------+-------------+
| 100 | b | xxxx | |
+--------------+--------------+-----------------+-------------+
| 100 | c | | 2.25 |
+--------------+--------------+-----------------+-------------+
| 1001 | a | | 42 |
+--------------+--------------+-----------------+-------------+
| 1001 | b | yyyy | |
+--------------+--------------+-----------------+-------------+
| 1001 | c | | 2.25 |
+--------------+--------------+-----------------+-------------+
| 1015 | a | | 37 |
+--------------+--------------+-----------------+-------------+
| 1015 | b | zzzz | |
+--------------+--------------+-----------------+-------------+
| 1015 | c | | 2.75 |
+--------------+--------------+-----------------+-------------+
我将如何实现这一目标?
+--------------+----+------+------+
| asset_number | a | b | c |
+--------------+----+------+------+
| 100 | 42 | xxxx | 2.25 |
+--------------+----+------+------+
| 1001 | 42 | yyyy | 2.25 |
+--------------+----+------+------+
| 1015 | 37 | zzzz | 2.75 |
+--------------+----+------+------+
请注意, value_string
和value_float
永远不会重叠,我的想法是将两列组合为一个value
列并执行以下操作:
df.pivot('asset_symbol','feature_name', 'value')
使用Series.fillna
在pivot
之前用另一列替换缺失值,如果两列中都None
,则解决方案工作正常,最后使用DataFrame.infer_objects
正确强制DataFrame.infer_objects
:
df1 = (df.assign(value_string = df['value_string'].fillna(df['value_float']))
.pivot('asset_number', 'feature_name','value_string')
.infer_objects()
.rename_axis(None, axis=1))
print (df1)
a b c
asset_number
100 42.0 xxxx 2.25
1001 42.0 yyyy 2.25
1015 37.0 zzzz 2.75
print (df1.dtypes)
a float64
b object
c float64
dtype: object
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.