简体   繁体   English

如何基于熊猫数据框中的行条件添加新列?

[英]How to add new column based on row condition in pandas dataframe?

I want to add new column based on row condition which is based on two different columns of same dataframe. 我想基于基于同一数据帧的两个不同列的行条件添加新列。

I have below Dataframe - 我在Dataframe下面-

df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:'',6:''},
        'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506',6:''}}
df=pd.DataFrame(df1_data)
print df

I want to add new column named as "sym". 我想添加名为“ sym”的新列。

Condition - 条件-

  1. If 'e_id' column value is not null then sym column value is 'e_id' column value. 如果'e_id'列值不为null,则sym列值为'e_id'列值。
  2. If 'r_id' column value is not null then sym column value is 'r_id' column value. 如果“ r_id”列值不为null,则sym列值为“ r_id”列值。
  3. If 'e_id' and 'r_id' both column values are null then remove this particular row from pandas dataframe. 如果'e_id'和'r_id'的两个列值均为null,则从pandas数据框中删除该特定行。

I tried with below code - 我尝试了以下代码-

df1_data = {'e_id': {0:'101',1:'',2:'103',3:'',4:'105',5:''},
        'r_id': {0:'',1:'502',2:'',3:'504',4:'',5:'506'}}

df=pd.DataFrame(df1_data)
print df

if df['e_id'].any():
    df['sym'] = df['e_id']
print df

if df['r_id'].any():
    df['sym'] = df['r_id']
print df

But it is giving me a wrong output. 但这给了我错误的输出。

Expected output - 预期产量-

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

pandas
Using mask + fillna + assign 使用mask + fillna + assign

d1 = df.mask(df == '')
df.assign(sym=d1.e_id.fillna(d1.r_id)).dropna(subset=['sym'])

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

How It Works 这个怎么运作

  • I need to mask your '' values with the assumption that you meant those to be null 我需要假设您''值是空值,以掩盖您''
  • By using fillna I take e_id if it's not null otherwise take r_id if it's not null 通过使用fillna如果e_id不为null,则使用e_id否则,如果r_id不为null,则使用r_id
  • dropna with subset=['sym'] only drops the row if the new column is null and that is only null if both e_id and r_id were null 仅当新r_id null时, r_id的值为subset=['sym'] dropna才删除行,并且仅当e_idr_id均为null r_id null

numpy
Using np.where + assign 使用np.where + assign

e = df.e_id.values
r = df.r_id.values
df.assign(
    sym=np.where(
        e != '', e,
        np.where(r != '', r, np.nan)
    )
).dropna(subset=['sym'])

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

numpy v2 numpy v2
Reconstruct the dataframe from values 从值重建数据框

v = df.values
m = (v != '').any(1)
v = v[m]
c1 = v[:, 0]
c2 = v[:, 1]
pd.DataFrame(
    np.column_stack([v, np.where(c1 != '', c1, c2)]),
    df.index[m], df.columns.tolist() + ['sym']
)

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Timing 定时

%%timeit
e = df.e_id.values
r = df.r_id.values
df.assign(sym=np.where(e != '', e, np.where(r != '', r, np.nan))).dropna(subset=['sym'])
1000 loops, best of 3: 1.23 ms per loop

%%timeit
d1 = df.mask(df == '')
df.assign(sym=d1.e_id.fillna(d1.r_id)).dropna(subset=['sym'])
100 loops, best of 3: 2.44 ms per loop

%%timeit
v = df.values
m = (v != '').any(1)
v = v[m]
c1 = v[:, 0]
c2 = v[:, 1]
pd.DataFrame(
    np.column_stack([v, np.where(c1 != '', c1, c2)]),
    df.index[m], df.columns.tolist() + ['sym']
)
1000 loops, best of 3: 204 µs per loop

First filter both empty columns by boolean indexing with any : 首先通过使用any进行boolean indexing过滤两个空列:

df = df[(df != '').any(1)]
#alternatively
#df = df[(df['e_id'] != '') | (df['r_id'] != '')]

Then use mask with combine_first : 然后将maskcombine_first一起combine_first

df['sym'] = df['e_id'].mask(df['e_id'] == '').combine_first(df['r_id'])
print (df)

  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

Numpy solution with filtering and numpy.where : 带过滤和numpy.where解决方案:

df = df[(df['e_id'] != '') | (df['r_id'] != '')]
e_id = df.e_id.values
r_id = df.r_id.values
df['sym'] = np.where(e_id != '', e_id, r_id)
print (df)
  e_id r_id  sym
0  101       101
1       502  502
2  103       103
3       504  504
4  105       105
5       506  506

You can start with column 'e_id' and replace its values with 'r_id' values whenever 'e_id' is "empty", using pandas.DataFrame.mask and the 'other' parameter: 您可以使用列pandas.DataFrame.mask'other'参数,从列“ e_id”开始,并在“ e_id”为“空”时将其值替换为“ r_id”值:

df['sym'] = df['e_id'].mask(df['e_id'] == '', other=df['r_id'], axis=0)

then you just need to remove rows where sym is "empty" 那么您只需要删除sym为“空”的行

df = df[df.sym!='']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据条件将级别添加到 pandas dataframe 中的新列? - how to add levels to a new column in pandas dataframe based on a condition? 如何根据第一列的条件在 pandas 中添加新行? - How to add a new row in pandas based on a condition from the first column? 如何根据先前的行和列条件填充 pandas dataframe 的行? - How to populate row of pandas dataframe based on previous row and column condition? Pandas DataFrame:添加具有基于前一行计算值的新列 - Pandas DataFrame: Add new column with calculated values based on previous row 如何根据 Pandas DataFrame 中的条件添加每组具有重复值的新列? - How do I add a new column with a repeated value per group based on condition in a Pandas DataFrame? 如何根据条件在 pandas dataframe 中创建一个新列? - How to create a new column in pandas dataframe based on a condition? 如何根据 Pandas dataframe 中的日期值和条件创建新列 - How to create a new column based on Date Values & Condition in Pandas dataframe 如何根据条件在 pandas dataframe 中添加新行? - How can I add a new line in pandas dataframe based in a condition? 根据pandas中的多个条件添加一个新的dataframe - add a new dataframe based on multiple condition in pandas 如何根据pandas中另一列的条件添加新列? - How to add new column based on the condition of another column in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM