如何在 python 中的 dataframe 中动态创建新列

Question

df1 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','c','d','e','f'],
    "empcity" : ['aa','bb','cc','dd','ee','ff']
})
df1

df2 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','m','d','n','f'],
    "empcity" : ['aa','bb','cc','ddd','ee','fff']
})
df2

df_all = pd.concat([df1.set_index('empid'),df2.set_index('empid')],axis='columns',keys=['first','second'])
df_all

df_final = df_all.swaplevel(axis = 'columns')[df1.columns[1:]]
df_final

Based on df_final data frame, need to create following output.基于df_final数据帧，需要创建如下output。 here comparison column need to created dynamically for every identical column as i'm trying to compare two data frame(both data frame structure and column name are same) where number of columns are more than 300这里需要为每个相同的列动态创建比较列，因为我正在尝试比较两个数据框（数据框结构和列名都相同），其中列数超过 300

Answer 1

Use DataFrame.stack for possible compare all levels columns first with second , create new column in DataFrame.assign and reshape back by DataFrame.unstack with DataFrame.swaplevel and DataFrame.reindex for original order: Use DataFrame.stack for possible compare all levels columns first with second , create new column in DataFrame.assign and reshape back by DataFrame.unstack with DataFrame.swaplevel and DataFrame.reindex for original order:

#original ordering
orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']

df_final = (df_all.stack()
                  .assign(comparions=lambda x: x['first'].eq(x['second']))
                  .unstack()
                  .swaplevel(axis = 'columns')
                  .reindex(orig, axis=1, level=0))
print (df_final)
      empname                   empcity                  
        first second comparions   first second comparions
empid                                                    
1           a      a       True      aa     aa       True
2           b      b       True      bb     bb       True
3           c      m      False      cc     cc       True
4           d      d       True      dd    ddd      False
5           e      n      False      ee     ee       True
6           f      f       True      ff    fff      False

Answer 2

(i) Use get_level_values to get the label values for level 0 (i) 使用get_level_values获取级别 0 的 label 值

(ii) Iterate over the outcome of (i) and for each level=0 , do element-wise comparison using eq between first and second (ii) 迭代 (i) 的结果，对于每个level=0 ，使用first和second之间的eq进行元素比较

(iii) use sort_index to sort columns in desired order (iii) 使用sort_index按所需顺序对列进行排序

for level_0 in df_final.columns.get_level_values(0).unique():
    df_final[(level_0, 'comparison')] = df_final[(level_0, 'first')].eq(df_final[(level_0,'second')])
df_final = df_final.sort_index(level=0, sort_remaining=False, axis=1)

Output: Output：

      empcity                   empname                  
        first second comparison   first second comparison
empid                                                    
1          aa     aa       True       a      a       True
2          bb     bb       True       b      b       True
3          cc     cc       True       c      m      False
4          dd    ddd      False       d      d       True
5          ee     ee       True       e      n      False
6          ff    fff      False       f      f       True

Answer 3

Directly comparing 2 dataframes with `==`直接将 2 个数据帧与`==`进行比较

You can do this with a simple == between two dataframes that you need to compare.您可以在需要比较的两个数据帧之间使用简单的==来做到这一点。 Let's start with the original 2 dataframes df1 and df2 -让我们从原始的 2 个数据帧df1和df2开始 -

first = df1.set_index('empid')
second = df2.set_index('empid')
comparisons = first==second      #<---

output = pd.concat([first, second, comparisons], axis=1,keys=['first','second', 'comparisons'])

#Swapping level and reindexing, borrowed from Jezrael's excellent answer
output = output.swaplevel(axis=1).reindex(first.columns, axis=1, level=0)
print(output)

      empname                    empcity                   
        first second comparisons   first second comparisons
empid                                                      
1           a      a        True      aa     aa        True
2           b      b        True      bb     bb        True
3           c      m       False      cc     cc        True
4           d      d        True      dd    ddd       False
5           e      n       False      ee     ee        True
6           f      f        True      ff    fff       False

Alternate approach with pandas groupby pandas groupby 的替代方法

In addition to the excellent answer by jezrael , I am adding an alternate way of doing this using pandas groupby.除了 jezrael 的出色回答之外，我还使用 pandas groupby 添加了另一种方法。

Tranpose to get columns as row indexes转置以获取列作为行索引
Groupby on first level which contains empcity and empname Groupby 在第一级，其中包含 empcity 和 empname
Apply comparison between the 2 rows在 2 行之间应用比较
Transpose back to columns转置回列
Add multi index columns by product of original columns and "comparisons"按原始列和“比较”的乘积添加多索引列
Combine the two dataframes (original one and one with comparisons)结合两个数据框（原始的一个和一个带有比较的）
Use swaplevel and reindex to get the order of columns that you need使用 swaplevel 和 reindex 来获取您需要的列的顺序

#create comparisons
comparisons = (df_all.T
                     .groupby(level=-1)
                     .apply(lambda x: x.iloc[0]==x.iloc[1])
                     .T)

#add multi index columns
comparisons.columns = pd.MultiIndex.from_product([['comparison'],comparisons.columns])

#concatenate with original data
df_final = pd.concat([df_all, comparisons], axis='columns')

#Swapping level and reindexing, borrowed from Jezrael's excellent answer
df_final = (df_final.swaplevel(axis = 'columns')
                    .reindex(df1.set_index('empid')
                                .columns, axis=1, level=0))
print(df_final)

      empname                   empcity                  
        first second comparison   first second comparison
empid                                                    
1           a      a       True      aa     aa       True
2           b      b       True      bb     bb       True
3           c      m      False      cc     cc       True
4           d      d       True      dd    ddd      False
5           e      n      False      ee     ee       True
6           f      f       True      ff    fff      False

如何在 python 中的 dataframe 中动态创建新列

问题描述

3 个解决方案

解决方案1
2 2022-01-19 07:26:57

解决方案2
1 2022-01-19 07:43:02

解决方案3
1 2022-01-19 08:32:27

Directly comparing 2 dataframes with `==`直接将 2 个数据帧与`==`进行比较

Alternate approach with pandas groupby pandas groupby 的替代方法

如何在 python 中的 dataframe 中动态创建新列

问题描述

3 个解决方案

解决方案1 2 2022-01-19 07:26:57

解决方案2 1 2022-01-19 07:43:02

解决方案3 1 2022-01-19 08:32:27

Directly comparing 2 dataframes with ==直接将 2 个数据帧与==进行比较

Alternate approach with pandas groupby pandas groupby 的替代方法

解决方案1
2 2022-01-19 07:26:57

解决方案2
1 2022-01-19 07:43:02

解决方案3
1 2022-01-19 08:32:27

Directly comparing 2 dataframes with `==`直接将 2 个数据帧与`==`进行比较