简体   繁体   中英

Python Pandas: I need to transform a pandas dataframe

the data frame looks like this. I have tried with pivot, stack, unstack. Is there any method to achieve the output

   key  attribute   text_value  numeric_value   date_value
0   1   order       NaN         NaN             10/02/19
1   1   size        NaN         43.0            NaN
2   1   weight      NaN         22.0            NaN
3   1   price       NaN         33.0            NaN
4   1   segment     product     NaN             NaN
5   2   order       NaN         NaN             11/02/19
6   2   size        NaN         34.0            NaN
7   2   weight      NaN         32.0            NaN
8   2   price       NaN         89.0            NaN
9   2   segment     customer    NaN             NaN


I need the following output

key  order      size    weight    price    segment
1    10/2/2019  43.0    22.0      33.0     product
2    11/2/2019  34.0    32.0      89.0     customer

Thanks in advance

I believe you dont want to change dtypes in output data, so possible solution is processing each column separately by DataFrame.dropna and DataFrame.pivot and then join together by concat :

df['date_value'] = pd.to_datetime(df['date_value'])

df1 = df.dropna(subset=['text_value']).pivot('key','attribute','text_value')
df2 = df.dropna(subset=['numeric_value']).pivot('key','attribute','numeric_value')
df3 = df.dropna(subset=['date_value']).pivot('key','attribute','date_value')

df = pd.concat([df1, df2, df3], axis=1).reindex(df['attribute'].unique(), axis=1)
print (df)
attribute      order  size  weight  price   segment
key                                                
1         2019-10-02  43.0    22.0   33.0   product
2         2019-11-02  34.0    32.0   89.0  customer

print (df.dtypes)
order      datetime64[ns]
size              float64
weight            float64
price             float64
segment            object
dtype: object

Old answer - all values are casted to strings:

df['date_value'] = pd.to_datetime(df['date_value'])

df['text_value'] = df['text_value'].fillna(df['numeric_value']).fillna(df['date_value'])

df = df.pivot('key','attribute','text_value')
print (df)
attribute                order price   segment size weight
key                                                       
1          1569974400000000000    33   product   43     22
2          1572652800000000000    89  customer   34     32

print (df.dtypes)
order      object
price      object
segment    object
size       object
weight     object
dtype: object

This is the solution, I figured out

attr_dict = {'order':'date_value', 'size':'numeric_value', 'weight':'numeric_value', 'price':'numeric_value', 'segment':'text_value'}

output_table = pd.DataFrame()
for attr in attr_dict.keys():
    temp = input_table[input_table['attribute'] == attr][['key', attr_dict[attr]]]
    temp.rename(columns={attr_dict[attr]:attr}, inplace=True)
    output_table[attr] = list(temp.values[:,1])
output_table

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM