[英]pandas shift rows NaNs
Say we have a dataframe set up as follows: 假设我们的数据框设置如下:
x = pd.DataFrame(np.random.randint(1, 10, 30).reshape(5,6),
columns=[f'col{i}' for i in range(6)])
x['col6'] = np.nan
x['col7'] = np.nan
col0 col1 col2 col3 col4 col5 col6 col7
0 6 5 1 5 2 4 NaN NaN
1 8 8 9 6 7 2 NaN NaN
2 8 3 9 6 6 6 NaN NaN
3 8 4 4 4 8 9 NaN NaN
4 5 3 4 3 8 7 NaN NaN
When calling x.shift(2, axis=1)
, col2 -> col5
shifts correctly, but col6
and col7
stays as NaN
? 当调用
x.shift(2, axis=1)
, col2 -> col5
正确移位,但col6
和col7
保持为NaN
? How can I overwrite the NaN
in col6
and col7
values with col4
and col5
's values? 我怎么能覆盖
NaN
在col6
和col7
与价值观col4
和col5
的价值观? Is this a bug or intended? 这是一个错误还是打算?
col0 col1 col2 col3 col4 col5 col6 col7
0 NaN NaN 6.0 5.0 1.0 5.0 NaN NaN
1 NaN NaN 8.0 8.0 9.0 6.0 NaN NaN
2 NaN NaN 8.0 3.0 9.0 6.0 NaN NaN
3 NaN NaN 8.0 4.0 4.0 4.0 NaN NaN
4 NaN NaN 5.0 3.0 4.0 3.0 NaN NaN
It's possible this is a bug, you can use np.roll
to achieve this: 这可能是一个错误,你可以使用
np.roll
来实现这个目的:
In[11]:
x.apply(lambda x: np.roll(x, 2), axis=1)
Out[11]:
col0 col1 col2 col3 col4 col5 col6 col7
0 NaN NaN 6.0 5.0 1.0 5.0 2.0 4.0
1 NaN NaN 8.0 8.0 9.0 6.0 7.0 2.0
2 NaN NaN 8.0 3.0 9.0 6.0 6.0 6.0
3 NaN NaN 8.0 4.0 4.0 4.0 8.0 9.0
4 NaN NaN 5.0 3.0 4.0 3.0 8.0 7.0
Speedwise, it's probably quicker to construct a df and reuse the existing columns and pass the result of np.roll
as the data arg to the constructor to DataFrame
: 在Speedwise中,构建df并重用现有列并将
np.roll
的结果作为数据arg传递给DataFrame
的构造函数可能DataFrame
:
In[12]:
x = pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
x
Out[12]:
col0 col1 col2 col3 col4 col5 col6 col7
0 NaN NaN 6.0 5.0 1.0 5.0 2.0 4.0
1 NaN NaN 8.0 8.0 9.0 6.0 7.0 2.0
2 NaN NaN 8.0 3.0 9.0 6.0 6.0 6.0
3 NaN NaN 8.0 4.0 4.0 4.0 8.0 9.0
4 NaN NaN 5.0 3.0 4.0 3.0 8.0 7.0
timings 计时
In[13]:
%timeit pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
%timeit x.fillna(0).astype(int).shift(2, axis=1)
10000 loops, best of 3: 117 µs per loop
1000 loops, best of 3: 418 µs per loop
So constructing a new df with the result of np.roll
is quicker than first filling the NaN
values, cast to int
, and then shift
ing. 因此,使用
np.roll
的结果构造一个新的df比首先填充NaN
值,转换为int
,然后shift
更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.