![](/img/trans.png)
[英]Melting a pandas DataFrame into dictionary with unique keys based on a column
[英]Melting Pandas Dataframe and separate the value column based on its data type
假設我有一個 Dataframe 正在從 CSV 讀取,看起來大致像這樣
date 1 2 3 4
05-10-2019 20 32 43.5 Auto
06-10-2019 19 Off 54.6 Auto
07-10-2019 Off 45 37 Auto
每個參數(1、2、3 等)都可以具有浮點值或字符串值。 有什么方法可以融化數據,以便在參數的數據類型上分隔值列? 當值是字符串時,參數的浮點列的值為None
,如果值是浮點數,則其字符串列的值為None
。
最后 dataframe 看起來像這樣
date parameter value message
05-10-2019 1 20 None
05-10-2019 2 32 None
05-10-2019 3 43.5 None
05-10-2019 4 None Auto
06-10-2019 1 19 None
06-10-2019 2 None Off
06-10-2019 3 54.6 None
................
07-10-2019 4 None Auto
第一步是DataFrame.melt
,然后通過to_numeric
將值轉換為數字, errors='coerce'
為非數字創建缺失值,因此可以使用DataFrame.assign
用於帶有Series.where
的非數字列:
df = df.melt('date', var_name='parameter')
s = pd.to_numeric(df['value'], errors='coerce')
df = df.assign(value = s, message = df['value'].where(s.isna()))
print (df)
date parameter value message
0 05-10-2019 1 20.0 NaN
1 06-10-2019 1 19.0 NaN
2 07-10-2019 1 NaN Off
3 05-10-2019 2 32.0 NaN
4 06-10-2019 2 NaN Off
5 07-10-2019 2 45.0 NaN
6 05-10-2019 3 43.5 NaN
7 06-10-2019 3 54.6 NaN
8 07-10-2019 3 37.0 NaN
9 05-10-2019 4 NaN Auto
10 06-10-2019 4 NaN Auto
11 07-10-2019 4 NaN Auto
如果順序很重要:
df = df.melt('date', var_name='parameter').sort_values(['date','parameter'])
s = pd.to_numeric(df['value'], errors='coerce')
df = df.assign(value = s, message = df['value'].where(s.isna()))
print (df)
date parameter value message
0 2019-05-10 1 20.0 NaN
3 2019-05-10 2 32.0 NaN
6 2019-05-10 3 43.5 NaN
9 2019-05-10 4 NaN Auto
1 2019-06-10 1 19.0 NaN
4 2019-06-10 2 NaN Off
7 2019-06-10 3 54.6 NaN
10 2019-06-10 4 NaN Auto
2 2019-07-10 1 NaN Off
5 2019-07-10 2 45.0 NaN
8 2019-07-10 3 37.0 NaN
11 2019-07-10 4 NaN Auto
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.