I have a pandas dataframe (1413 rows) and a numpy array (1412 rows).
type(df1)
Out[193]: pandas.core.frame.DataFrame
df1.shape
Out[194]: (1413, 15)
type(arr1)
Out[195]: numpy.ndarray
arr1.shape
Out[196]: (1412, 3)
I would like to fill a column in the df1 with a column in arr1 + nan, but it does not work
df1['aaa'] = np.vstack((np.nan, arr1[:,0]))
Could anyone let me know how to do it?
I have a pandas dataframe (1413 rows) and a numpy array (1412 rows).
type(df1)
Out[193]: pandas.core.frame.DataFrame
df1.shape
Out[194]: (1413, 15)
type(arr1)
Out[195]: numpy.ndarray
arr1.shape
Out[196]: (1412, 3)
I would like to fill a column in the df1 with a column in arr1 + nan, but it does not work
df1['aaa'] = np.vstack((np.nan, arr1[:,0]))
Could anyone let me know how to do it?
Use numpy.hstack
for add one value to 1d
array:
df1 = pd.DataFrame({'a': range(6)})
arr1 = np.arange(15).reshape(5,3)
print (arr1)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
df1['aaa'] = np.hstack((np.nan, arr1[:,0]))
print (df1)
a aaa
0 0 NaN
1 1 0.0
2 2 3.0
3 3 6.0
4 4 9.0
5 5 12.0
Another idea if possible non default index of DataFrame
is use Series
constructor with indexing df1.index
:
df1 = pd.DataFrame({'a': range(6)}, index=list('abcdef'))
arr1 = np.arange(15).reshape(5,3)
print (arr1)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]
[12 13 14]]
dif = df1.shape[0] - arr1.shape[0]
df1['aaa'] = pd.Series(arr1[:,0], index=df1.index[dif:])
print (df1)
a aaa
a 0 NaN
b 1 0.0
c 2 3.0
d 3 6.0
e 4 9.0
f 5 12.0
Last position:
dif = df1.shape[0] - arr1.shape[0]
df1['aaa'] = pd.Series(arr1[:,0], index=df1.index[:-dif])
print (df1)
a aaa
a 0 0.0
b 1 3.0
c 2 6.0
d 3 9.0
e 4 12.0
f 5 NaN
EDIT:
arr1 = np.arange(15).reshape(5,3)
df1 = pd.DataFrame({'a': range(6)})
If select by 0
only get 1d
array with shape (6,)
, so is necessary numpy.hstack
:
a = np.hstack((np.nan, arr1[:,0]))
print (a)
[nan 0. 3. 6. 9. 12.]
print (a.shape)
(6,)
df1['aaa'] = a
If select by [0]
get 2d
array with dimensions MxN
with shape (6,1)
, so is possible use numpy.vstack
:
a1 = np.vstack((np.nan, arr1[:,[0]]))
print (a1)
[[nan]
[ 0.]
[ 3.]
[ 6.]
[ 9.]
[12.]]
print (a1.shape)
(6, 1)
df1['aaa1'] = a1
print (df1)
a aaa aaa1
0 0 NaN NaN
1 1 0.0 0.0
2 2 3.0 3.0
3 3 6.0 6.0
4 4 9.0 9.0
5 5 12.0 12.0
You can do this, here you have the result. You add the column and the first line is NaN:
df['aaa'] = pd.Series(ar1[:,0])
ea = np.empty(df.shape[1]).fill(np.nan)
df.loc[-1] = ea
df.index = df.index + 1
df = df.reset_index(drop=True).sort_values(by=['aaa'], na_position='first')
Here is your DataFrame:
c1 c2 c3
0 1 2 3
1 10 20 30
Here is the array:
[[ 5 55]
[ 50 550]]
And the result is this :
c1 c2 c3 aaa
2 NaN NaN NaN NaN
0 1.0 2.0 3.0 5.0
1 10.0 20.0 30.0 50.0
你可以使用np.append
df1['aaa'] = np.append(np.nan, arr1[:,0])
While I can see several other answers, none of them have really addressed the problem at hand. Intuitively, your approach is okay; you're stacking nan
vertically on a column array.
df1['aaa'] = np.vstack((np.nan, arr1[:,0]))
It should work, but it doesn't. The small problem here is that vstack
searches for a column dimension. arr1[:,0]
has the shape (1412, )
; it doesn't have a second dimension. Simple reshaping it to (1412,1)
will make vstack
work just fine.
df1['aaa'] = np.vstack((np.nan, arr1[:,0].reshape(-1,1)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.