简体   繁体   English

比较 dataframe 中的数据,并在该列中添加具有修改数据值的新列

[英]Compare data within a dataframe and add new column with modified data values in that column

Attached is my data frame and I want to compare column SOI priority and column %stake and form comment accordingly.附件是我的数据框,我想比较列 SOI 优先级和列 %stake 并相应地形成评论。 I tried the below code.我尝试了下面的代码。

treasury_shares['Priority comment']=""

    temp=round(treasury_shares['%Stake'] * 100, 0)
treasury_shares['%Stake'] = round(treasury_shares['%Stake'] * 100, 0).astype(str) + "%"
    # treasury_shares["%Stake"] = treasury_shares["%Stake"].str.replace(".0", "")
    treasury_shares = treasury_shares.reindex(
        columns=["performance_id", "SOI priority", "Date", "issued_shares_as_reported",
                 "share_level",
                 "share_be", "%Stake","Priority comment"])
    if((temp>10)&(treasury_shares['SOI priority']==1)):
        treasury_shares['Priority comment'] = 'SOI'+treasury_shares['SOI priority']+'&Stake>10'

I am getting the following error.我收到以下错误。 line 1329, in nonzero raise ValueError( ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().第 1329 行,在非零中引发 ValueError(ValueError: Series 的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

Attached is the data frame image附上数据框图片数据框

数据框演示

+-------------+-------------+------------------+
| SOI_prority |   %Stake    | Priority_comment |
+-------------+-------------+------------------+
|           1 |         44% |   SOI1&Stake>10% |
+-------------+-------------+------------------+
import pandas as pd
import numpy as np

data = {
    'performance_id': ['ASD'],
    'SOI priority': ['1'],
    'Date': ['31-Mar-22'],
    'issued_shares_as_reported': ['6,06,13,663'],
    'share_level': ['2,55,85,542'],
    'share_be': ['3,42,28,121'],
    '%Stake': ['0.44'],
    'Priority': ['P1'],
    'Priority comment': ['SOI1 & Stake>10%'],

}
treasury_shares = pd.DataFrame(data)

treasury_shares['Priority comment'] = ""

temp = treasury_shares['%Stake'].astype(float) * 100

# print(temp)
# treasury_shares['%Stake'] = round(treasury_shares['%Stake'].astype(int) * 100, 0).astype(str) + "%"
# treasury_shares["%Stake"] = treasury_shares["%Stake"].str.replace(".0", "")
treasury_shares = treasury_shares.reindex(
    columns=["performance_id", "SOI priority", "Date", "issued_shares_as_reported",
             "share_level",
             "share_be", "%Stake", "Priority comment"])

# creating conditional masks, where the condition that you want will be = 1, you can also use boolean like = True/False
treasury_shares['new_conditional'] = np.where(
    (temp > 10) &
    (treasury_shares['SOI priority'].astype('int32') == 1),
    1, 0
).astype('int32')

# Using the mask for your conditionals, where the same column is changed
treasury_shares['Priority comment'] = np.where(treasury_shares['new_conditional'] == 1,
                                               'SOI' + (treasury_shares[
                                                   'SOI priority']).astype('string') + '&Stake>10',
                                               treasury_shares['Priority comment'])

print(treasury_shares['Priority comment'])

# Panda doesn't work with 'if' clause, this is built-in for python, but panda is not built-in
# if((temp>10)&(treasury_shares['SOI priority']==1)):
#     treasury_shares['Priority comment'] = 'SOI'+treasury_shares['SOI priority']+'&Stake>10'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM