[英]How to add values to new column according to conditions?
I am trying to put a new column to my dataset according to condition, however, resulting dataframe is not what I am expecting. 我试图根据条件在我的数据集中添加一个新列,但是,结果数据框不是我所期望的。
I've tried some approaches and this is closest what I've been. 我已经尝试了一些方法,这与我所经历的最接近。
import pandas as pd
data = {'Date' : ['3-Mar', '20-Mar', '20-Apr', '21-Apr', '29-Apr', '7-
May', '30-May', '31-May', '7-Jun', '16-Jun',
'1-Jul', '2-Jul', '10-Jul'],
'Value' : [0.5840, 0.8159, 0.7789, 0.7665, 0.8510, 0.7428, 0.7124,
0.6820, 0.8714, 0.8902, 0.8596, 0.8289, 0.6877],}
frame = pd.DataFrame(data)
for counter, value in enumerate(frame['Value']):
if value >= 0.7:
frame = frame.append({'result': 'High'}, ignore_index=True)
else:
frame = frame.append({'result': 'Low'}, ignore_index=True)
print(frame)
Result is: 结果是:
Date Value result
0 3-Mar 0.5840 NaN
1 20-Mar 0.8159 NaN
2 20-Apr 0.7789 NaN
3 21-Apr 0.7665 NaN
4 29-Apr 0.8510 NaN
5 7-May 0.7428 NaN
6 30-May 0.7124 NaN
7 31-May 0.6820 NaN
8 7-Jun 0.8714 NaN
9 16-Jun 0.8902 NaN
10 1-Jul 0.8596 NaN
11 2-Jul 0.8289 NaN
12 10-Jul 0.6877 NaN
13 NaN NaN Low
14 NaN NaN High
15 NaN NaN High
16 NaN NaN High
17 NaN NaN High
18 NaN NaN High
19 NaN NaN High
20 NaN NaN Low
21 NaN NaN High
22 NaN NaN High
23 NaN NaN High
24 NaN NaN High
25 NaN NaN Low
However, I am expecting that values will be placed next to the existing ones not new ones. 但是,我希望这些值将放置在现有值而不是新值的旁边。
Thank you! 谢谢!
If you look at the documentation of the append function, you'll see that it appends rows to the end of the dataframe which is not what you want: 如果您查看append函数的文档,您会发现它会将行追加到数据框的末尾,而不是您想要的:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html
You can achieve this with a lambda function, which essentially iterates over every row applying whatever logic you want. 您可以使用lambda函数来实现此目的,该函数实际上会应用所需的逻辑遍历每一行。
frame['result'] = frame['Value'].apply(lambda x: 'High' if x > .7 else "Low")
If I understand well, this is probably already answered but here you go 如果我理解得很好,这可能已经回答了,但是您可以
you need to create a new column result
您需要创建一个新的列result
define a function (for readability) which takes a value and returns the result 定义一个函数(出于可读性),该函数接受一个值并返回结果
def udf(value):
if value >= .7:
return "High"
else
return "Low"
then apply this function to the column value 然后将此功能应用于列值
frame['result'] = frame['Value'].apply(udf)
I suggest you read the doc the DataFrame.apply 我建议您阅读DataFrame.apply文档
Using pandas.Series could fix your issue 使用pandas.Series可以解决您的问题
import pandas as pd
data = {'Date' : ['3-Mar', '20-Mar', '20-Apr', '21-Apr', '29-Apr', '7- May',
'30-May', '31-May', '7-Jun', '16-Jun','1-Jul', '2-Jul', '10-Jul'],
'Value' : [0.5840, 0.8159, 0.7789, 0.7665, 0.8510, 0.7428, 0.7124,
0.6820, 0.8714, 0.8902, 0.8596, 0.8289, 0.6877]}
frame = pd.DataFrame(data)
frame['result'] = pd.Series(['High' if x >= 0.7 else 'Low' for x in frame['Value']])
Output : 输出:
Date Value result
0 3-Mar 0.5840 Low
1 20-Mar 0.8159 High
2 20-Apr 0.7789 High
3 21-Apr 0.7665 High
4 29-Apr 0.8510 High
5 7- May 0.7428 High
6 30-May 0.7124 High
7 31-May 0.6820 Low
8 7-Jun 0.8714 High
9 16-Jun 0.8902 High
10 1-Jul 0.8596 High
11 2-Jul 0.8289 High
12 10-Jul 0.6877 Low
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.