[英]How to create a new column in pandas dataframe based on a condition?
I have a data frame with the following columns:我有一个包含以下列的数据框:
d = {'find_no': [1, 2, 3], 'zip_code': [32351, 19207, 8723]}
df = pd.DataFrame(data=d)
When there are 5 digits in the zip_code column, I want to return True.当 zip_code 列中有 5 位数字时,我想返回 True。 When there are not 5 digits, I want to return the "find_no".当没有 5 位数字时,我想返回“find_no”。 Sample output would have the results in an added column to the dataframe, corresponding to the row it's referencing.示例 output 的结果将添加到 dataframe 的列中,对应于它引用的行。
You could try np.where:你可以试试 np.where:
import numpy as np
df['result'] = np.where(df['zip_code'].astype(str).str.len() == 5, True, df['find_no'])
Only downside with this approach is that NumPy will convert your True values to 1's, which could be confusing.这种方法的唯一缺点是 NumPy 会将您的 True 值转换为 1,这可能会造成混淆。 An approach to keep the values you want is to do保持你想要的价值观的一种方法是
import numpy as np
df['result'] = np.where(df['zip_code'].astype(str).str.len() == 5, 'True', df['find_no'].astype(str))
The downside here being that you lose the meaning of those values by casting them to strings.这里的缺点是你通过将它们转换为字符串而失去了这些值的意义。 I guess it all depends on what you're hoping to accomplish.我想这完全取决于您希望实现的目标。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.