简体   繁体   English

如何根据条件将值添加到新列?

[英]How to add values to new column according to conditions?

I am trying to put a new column to my dataset according to condition, however, resulting dataframe is not what I am expecting. 我试图根据条件在我的数据集中添加一个新列,但是,结果数据框不是我所期望的。

I've tried some approaches and this is closest what I've been. 我已经尝试了一些方法,这与我所经历的最接近。

import pandas as pd

data = {'Date' : ['3-Mar', '20-Mar', '20-Apr', '21-Apr', '29-Apr', '7- 
         May', '30-May', '31-May', '7-Jun', '16-Jun',
        '1-Jul', '2-Jul', '10-Jul'],
        'Value' : [0.5840, 0.8159, 0.7789, 0.7665, 0.8510, 0.7428, 0.7124, 
        0.6820, 0.8714, 0.8902, 0.8596, 0.8289, 0.6877],}
frame = pd.DataFrame(data)

for counter, value in enumerate(frame['Value']):
    if value >= 0.7:
        frame = frame.append({'result': 'High'}, ignore_index=True)   
    else:
        frame = frame.append({'result': 'Low'}, ignore_index=True)   

print(frame)

Result is: 结果是:

     Date   Value result
0    3-Mar  0.5840    NaN
1   20-Mar  0.8159    NaN
2   20-Apr  0.7789    NaN
3   21-Apr  0.7665    NaN
4   29-Apr  0.8510    NaN
5    7-May  0.7428    NaN
6   30-May  0.7124    NaN
7   31-May  0.6820    NaN
8    7-Jun  0.8714    NaN
9   16-Jun  0.8902    NaN
10   1-Jul  0.8596    NaN
11   2-Jul  0.8289    NaN
12  10-Jul  0.6877    NaN
13     NaN     NaN    Low
14     NaN     NaN   High
15     NaN     NaN   High
16     NaN     NaN   High
17     NaN     NaN   High
18     NaN     NaN   High
19     NaN     NaN   High
20     NaN     NaN    Low
21     NaN     NaN   High
22     NaN     NaN   High
23     NaN     NaN   High
24     NaN     NaN   High
25     NaN     NaN    Low

However, I am expecting that values will be placed next to the existing ones not new ones. 但是,我希望这些值将放置在现有值而不是新值的旁边。

Thank you! 谢谢!

If you look at the documentation of the append function, you'll see that it appends rows to the end of the dataframe which is not what you want: 如果您查看append函数的文档,您会发现它会将行追加到数据框的末尾,而不是您想要的:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

You can achieve this with a lambda function, which essentially iterates over every row applying whatever logic you want. 您可以使用lambda函数来实现此目的,该函数实际上会应用所需的逻辑遍历每一行。

frame['result'] = frame['Value'].apply(lambda x: 'High' if x > .7 else "Low")

If I understand well, this is probably already answered but here you go 如果我理解得很好,这可能已经回答了,但是您可以

you need to create a new column result 您需要创建一个新的列result

define a function (for readability) which takes a value and returns the result 定义一个函数(出于可读性),该函数接受一个值并返回结果

def udf(value):
    if value >= .7:
        return "High"
    else
        return "Low"

then apply this function to the column value 然后将此功能应用于列值

frame['result'] = frame['Value'].apply(udf)

I suggest you read the doc the DataFrame.apply 我建议您阅读DataFrame.apply文档

Using pandas.Series could fix your issue 使用pandas.Series可以解决您的问题

import pandas as pd

data = {'Date' : ['3-Mar', '20-Mar', '20-Apr', '21-Apr', '29-Apr', '7- May', 
                  '30-May', '31-May', '7-Jun', '16-Jun','1-Jul', '2-Jul', '10-Jul'],
        'Value' : [0.5840, 0.8159, 0.7789, 0.7665, 0.8510, 0.7428, 0.7124, 
                   0.6820, 0.8714, 0.8902, 0.8596, 0.8289, 0.6877]}
frame = pd.DataFrame(data)
frame['result'] = pd.Series(['High' if x >= 0.7 else 'Low' for x in frame['Value']])

Output : 输出:

Date    Value   result
0   3-Mar   0.5840  Low
1   20-Mar  0.8159  High
2   20-Apr  0.7789  High
3   21-Apr  0.7665  High
4   29-Apr  0.8510  High
5   7- May  0.7428  High
6   30-May  0.7124  High
7   31-May  0.6820  Low
8   7-Jun   0.8714  High
9   16-Jun  0.8902  High
10  1-Jul   0.8596  High
11  2-Jul   0.8289  High
12  10-Jul  0.6877  Low

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM