[英]How to loop through pandas dataframe, and conditionally assign values to a row of a variable?
I'm trying to loop through the 'vol' dataframe, and conditionally check if the sample_date is between certain dates. 我正在尝试遍历“ vol”数据框,并有条件地检查sample_date是否在某些日期之间。 If it is, assign a value to another column.
如果是,则将值分配给另一列。
Here's the following code I have: 这是我的以下代码:
vol = pd.DataFrame(data=pd.date_range(start='11/3/2015', end='1/29/2019'))
vol.columns = ['sample_date']
vol['hydraulic_vol'] = np.nan
for i in vol.iterrows():
if pd.Timestamp('2015-11-03') <= vol.loc[i,'sample_date'] <= pd.Timestamp('2018-06-07'):
vol.loc[i,'hydraulic_vol'] = 319779
Here's the error I received: TypeError: 'Series' objects are mutable, thus they cannot be hashed 这是我收到的错误:TypeError:“系列”对象是可变的,因此无法进行哈希处理
This is how you would do it properly: 这是您正确执行的方法:
cond = (pd.Timestamp('2015-11-03') <= vol.sample_date) &
(vol.sample_date <= pd.Timestamp('2018-06-07'))
vol.loc[cond, 'hydraulic_vol'] = 319779
Another way to do this would be to use the np.where
method from the numpy
module, in combination with the .between
method. 另一种方式做,这是使用的
np.where
方法从numpy
模块,结合了.between
方法。
This method works like this: 此方法的工作方式如下:
np.where(condition, value if true, value if false)
Code example 代码示例
cond = vol.sample_date.between('2015-11-03', '2018-06-07')
vol['hydraulic_vol'] = np.where(cond, 319779, np.nan)
Or you can combine them in one single line of code: 或者,您可以将它们合并为一行代码:
vol['hydraulic_vol'] = np.where(vol.sample_date.between('2015-11-03', '2018-06-07'), 319779, np.nan)
Edit 编辑
I see that you're new here, so here's something I had to learn as well coming to python/pandas. 我发现您是新来的人,因此在学习python / pandas时,我还必须学习一些东西。
Looping over a dataframe should be your last resort, try to use vectorized solutions
, in this case .loc
or np.where
, these will perform better in terms of speed compared to looping. 在数据帧上循环应是最后的选择,请尝试使用
vectorized solutions
,在这种情况下为.loc
或np.where
,与循环相比,它们在速度方面会表现更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.