简体   繁体   English

具有来自两个非重叠列的值的熊猫数据透视表

[英]pandas pivot table with values from two non overlapping columns

I have the following dataframe :我有以下dataframe

df = pd.DataFrame({'asset_number': [100, 100, 100, 1001, 1001, 1001, 1015, 1015, 1015],
                   'feature_name': ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c'],
                   'value_string': [None, 'xxxx', None, None, 'yyyy', None, None, 'zzzz', None],
                   'value_float': [42.0, None, 2.25, 42.0, None, 2.25, 37.0, None, 2.75]}
)

+--------------+--------------+-----------------+-------------+
| asset_number | feature_name | value_string    | value_float |
+--------------+--------------+-----------------+-------------+
| 100          | a            |                 | 42          |
+--------------+--------------+-----------------+-------------+
| 100          | b            | xxxx            |             |
+--------------+--------------+-----------------+-------------+
| 100          | c            |                 | 2.25        |
+--------------+--------------+-----------------+-------------+
| 1001         | a            |                 | 42          |
+--------------+--------------+-----------------+-------------+
| 1001         | b            | yyyy            |             |
+--------------+--------------+-----------------+-------------+
| 1001         | c            |                 | 2.25        |
+--------------+--------------+-----------------+-------------+
| 1015         | a            |                 | 37          |
+--------------+--------------+-----------------+-------------+
| 1015         | b            | zzzz            |             |
+--------------+--------------+-----------------+-------------+
| 1015         | c            |                 | 2.75        |
+--------------+--------------+-----------------+-------------+

how would I achieve this?我将如何实现这一目标?

+--------------+----+------+------+
| asset_number | a  | b    | c    |
+--------------+----+------+------+
| 100          | 42 | xxxx | 2.25 |
+--------------+----+------+------+
| 1001         | 42 | yyyy | 2.25 |
+--------------+----+------+------+
| 1015         | 37 | zzzz | 2.75 |
+--------------+----+------+------+

Note that value_string and value_float never overlap, my idea was to combing the two columns to a single value column and do:请注意, value_stringvalue_float永远不会重叠,我的想法是将两列组合为一个value列并执行以下操作:

df.pivot('asset_symbol','feature_name', 'value')

Use Series.fillna for replace missing values by another column before pivot , solution working correct if None in both columns, last for correct coerce dtypes is used DataFrame.infer_objects :使用Series.fillnapivot之前用另一列替换缺失值,如果两列中都None ,则解决方案工作正常,最后使用DataFrame.infer_objects正确强制DataFrame.infer_objects

df1 = (df.assign(value_string = df['value_string'].fillna(df['value_float']))
         .pivot('asset_number', 'feature_name','value_string')
         .infer_objects()
         .rename_axis(None, axis=1))
print (df1)
                 a     b     c
asset_number                  
100           42.0  xxxx  2.25
1001          42.0  yyyy  2.25
1015          37.0  zzzz  2.75

print (df1.dtypes)
a    float64
b     object
c    float64
dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM