簡體   English   中英

使用from_dict預先添加而不是在熊貓中附加NaN

[英]Prepending instead of appending NaNs in pandas using from_dict

我有一個pandas數據幀,我從Python中的defaultdict讀取,但有些列有不同的長度。 以下是數據的外觀:

Date      col1    col2    col3    col4    col5
01-01-15  5       12      1      -15      10
01-02-15  7       0       9       11      7
01-03-15          6       1       2       18
01-04-15          9       8       10
01-05-15         -4               7
01-06-15         -11             -1
01-07-15          6               

我可以像這樣用NaN填充空白:

pd.DataFrame.from_dict(pred_dict, orient='index').T

這使:

Date      col1    col2    col3    col4    col5
01-01-15  5       12      1      -15      10
01-02-15  7       0       9       11      7
01-03-15  NaN     6       1       2       18
01-04-15  NaN     9       8       10      NaN
01-05-15  NaN    -4       NaN     7       NaN
01-06-15  NaN    -11      NaN    -1       NaN
01-07-15  NaN     6       NaN     NaN     NaN

但是,我真正想要的是一種預先添加NaN而不是將它們添加到最后的方法,以便數據看起來像這樣:

Date      col1    col2    col3    col4    col5
01-01-15  NaN     12      NaN     NaN     NaN
01-02-15  NaN     0       NaN    -15      NaN
01-03-15  NaN     6       NaN     11      NaN
01-04-15  NaN     9       1       2       NaN
01-05-15  NaN    -4       9       10      10
01-06-15  5      -11      1       7       7
01-07-15  7       6       8      -1       18

是否有捷徑可尋?

您可以使用以下代碼重新創建字典:

import pandas as pd
from collections import defaultdict

d = defaultdict(list)
d["Date"].extend([
    "01-01-15", 
    "01-02-15", 
    "01-03-15", 
    "01-04-15", 
    "01-05-15",
    "01-06-15",
    "01-07-15"
])
d["col1"].extend([5, 7])
d["col2"].extend([12, 0, 6, 9, -4, -11, 6])
d["col3"].extend([1, 9, 1, 8])
d["col4"].extend([-15, 11, 2, 10, 7, -1])
d["col5"].extend([10, 7, 18])

您可以使用Series.shift來引發Series / DataFrame。 遺憾的是,您無法傳遞句點數組 - 您必須將每列移動一個整數值。

s = df.isnull().sum()
for col, periods in s.iteritems():
    df[col] = df[col].shift(periods)

對您之前的問題的itertools解決方案進行了一些修改:

pd.DataFrame(list(itertools.zip_longest(*[reversed(i) for i in d.values()]))[::-1], columns=d.keys()).sort_index(axis=1)
Out[143]: 
       Date  col1  col2  col3  col4  col5
0  01-01-15   NaN    12   NaN   NaN   NaN
1  01-02-15   NaN     0   NaN -15.0   NaN
2  01-03-15   NaN     6   NaN  11.0   NaN
3  01-04-15   NaN     9   1.0   2.0   NaN
4  01-05-15   NaN    -4   9.0  10.0  10.0
5  01-06-15   5.0   -11   1.0   7.0   7.0
6  01-07-15   7.0     6   8.0  -1.0  18.0

反轉字典中的每個列表:

for k, v in d.iteritems():
    d[k] = v[::-1]

df = pd.DataFrame.from_dict(d, orient='index').T.set_index('Date').sort_index(1).sort_index().astype(float)

在此輸入圖像描述

這是一個矢量化方法,它使用pd.DataFrame.from_dict來獲取通常情況下的數據幀。 一旦我們獲得了常規的2D數據,就可以輕松地翻轉和屏蔽並以矢量化方式獲得所需的輸出數據幀。

實施如下 -

# Get the normal case output
df = pd.DataFrame.from_dict(d, orient='index').T

# Use masking to flip and select flipped elements to re-create expected df
colmask = df.columns!='Date'
arr = np.array(df.ix[:,colmask].values, dtype=np.float).T
mask = ~np.isnan(arr)
out_arr = np.full(mask.shape,np.nan)
out_arr[mask[:,::-1]] = arr[mask]
df.ix[:,colmask] = out_arr.T

樣品運行 -

In [209]: d.values()
Out[209]: 
[[-15, 11, 2, 10, 7, -1],
 [10, 7, 18],
 [12, 0, 6, 9, -4, -11, 6],
 [1, 9, 1, 8],
 [5, 7],
 ['01-01-15',
  '01-02-15',
  '01-03-15',
  '01-04-15',
  '01-05-15',
  '01-06-15',
  '01-07-15']]

In [210]: df
Out[210]: 
  col4 col5 col2 col3 col1      Date
0  NaN  NaN   12  NaN  NaN  01-01-15
1  -15  NaN    0  NaN  NaN  01-02-15
2   11  NaN    6  NaN  NaN  01-03-15
3    2  NaN    9    1  NaN  01-04-15
4   10   10   -4    9  NaN  01-05-15
5    7    7  -11    1    5  01-06-15
6   -1   18    6    8    7  01-07-15

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM