简体   繁体   English

如何在 python 中的 dataframe 中动态创建新列

[英]How to create new column dynamically in dataframe in python

df1 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','c','d','e','f'],
    "empcity" : ['aa','bb','cc','dd','ee','ff']
})
df1

df2 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','m','d','n','f'],
    "empcity" : ['aa','bb','cc','ddd','ee','fff']
})
df2

df_all = pd.concat([df1.set_index('empid'),df2.set_index('empid')],axis='columns',keys=['first','second'])
df_all

df_final = df_all.swaplevel(axis = 'columns')[df1.columns[1:]]
df_final

Based on df_final data frame, need to create following output.基于df_final数据帧,需要创建如下output。 here comparison column need to created dynamically for every identical column as i'm trying to compare two data frame(both data frame structure and column name are same) where number of columns are more than 300这里需要为每个相同的列动态创建比较列,因为我正在尝试比较两个数据框(数据框结构和列名都相同),其中列数超过 300

在此处输入图像描述

Use DataFrame.stack for possible compare all levels columns first with second , create new column in DataFrame.assign and reshape back by DataFrame.unstack with DataFrame.swaplevel and DataFrame.reindex for original order: Use DataFrame.stack for possible compare all levels columns first with second , create new column in DataFrame.assign and reshape back by DataFrame.unstack with DataFrame.swaplevel and DataFrame.reindex for original order:

#original ordering
orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']

df_final = (df_all.stack()
                  .assign(comparions=lambda x: x['first'].eq(x['second']))
                  .unstack()
                  .swaplevel(axis = 'columns')
                  .reindex(orig, axis=1, level=0))
print (df_final)
      empname                   empcity                  
        first second comparions   first second comparions
empid                                                    
1           a      a       True      aa     aa       True
2           b      b       True      bb     bb       True
3           c      m      False      cc     cc       True
4           d      d       True      dd    ddd      False
5           e      n      False      ee     ee       True
6           f      f       True      ff    fff      False

(i) Use get_level_values to get the label values for level 0 (i) 使用get_level_values获取级别 0 的 label 值

(ii) Iterate over the outcome of (i) and for each level=0 , do element-wise comparison using eq between first and second (ii) 迭代 (i) 的结果,对于每个level=0 ,使用firstsecond之间的eq进行元素比较

(iii) use sort_index to sort columns in desired order (iii) 使用sort_index按所需顺序对列进行排序

for level_0 in df_final.columns.get_level_values(0).unique():
    df_final[(level_0, 'comparison')] = df_final[(level_0, 'first')].eq(df_final[(level_0,'second')])
df_final = df_final.sort_index(level=0, sort_remaining=False, axis=1)

Output: Output:

      empcity                   empname                  
        first second comparison   first second comparison
empid                                                    
1          aa     aa       True       a      a       True
2          bb     bb       True       b      b       True
3          cc     cc       True       c      m      False
4          dd    ddd      False       d      d       True
5          ee     ee       True       e      n      False
6          ff    fff      False       f      f       True

Directly comparing 2 dataframes with ==直接将 2 个数据帧与==进行比较

You can do this with a simple == between two dataframes that you need to compare.您可以在需要比较的两个数据帧之间使用简单的==来做到这一点。 Let's start with the original 2 dataframes df1 and df2 -让我们从原始的 2 个数据帧df1df2开始 -

first = df1.set_index('empid')
second = df2.set_index('empid')
comparisons = first==second      #<---

output = pd.concat([first, second, comparisons], axis=1,keys=['first','second', 'comparisons'])

#Swapping level and reindexing, borrowed from Jezrael's excellent answer
output = output.swaplevel(axis=1).reindex(first.columns, axis=1, level=0)
print(output)
      empname                    empcity                   
        first second comparisons   first second comparisons
empid                                                      
1           a      a        True      aa     aa        True
2           b      b        True      bb     bb        True
3           c      m       False      cc     cc        True
4           d      d        True      dd    ddd       False
5           e      n       False      ee     ee        True
6           f      f        True      ff    fff       False

Alternate approach with pandas groupby pandas groupby 的替代方法

In addition to the excellent answer by jezrael , I am adding an alternate way of doing this using pandas groupby.除了 jezrael 的出色回答之外,我还使用 pandas groupby 添加了另一种方法。

  1. Tranpose to get columns as row indexes转置以获取列作为行索引
  2. Groupby on first level which contains empcity and empname Groupby 在第一级,其中包含 empcity 和 empname
  3. Apply comparison between the 2 rows在 2 行之间应用比较
  4. Transpose back to columns转置回列
  5. Add multi index columns by product of original columns and "comparisons"按原始列和“比较”的乘积添加多索引列
  6. Combine the two dataframes (original one and one with comparisons)结合两个数据框(原始的一个和一个带有比较的)
  7. Use swaplevel and reindex to get the order of columns that you need使用 swaplevel 和 reindex 来获取您需要的列的顺序
#create comparisons
comparisons = (df_all.T
                     .groupby(level=-1)
                     .apply(lambda x: x.iloc[0]==x.iloc[1])
                     .T)

#add multi index columns
comparisons.columns = pd.MultiIndex.from_product([['comparison'],comparisons.columns])

#concatenate with original data
df_final = pd.concat([df_all, comparisons], axis='columns')

#Swapping level and reindexing, borrowed from Jezrael's excellent answer
df_final = (df_final.swaplevel(axis = 'columns')
                    .reindex(df1.set_index('empid')
                                .columns, axis=1, level=0))
print(df_final)
      empname                   empcity                  
        first second comparison   first second comparison
empid                                                    
1           a      a       True      aa     aa       True
2           b      b       True      bb     bb       True
3           c      m      False      cc     cc       True
4           d      d       True      dd    ddd      False
5           e      n      False      ee     ee       True
6           f      f       True      ff    fff      False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 python 中的 Dataframe 中创建新的自动增量列 - How to Create new Autoincrement Column in to Dataframe in python Python Spark - 如何创建一个新列,在数据帧上对现有列进行切片? - Python Spark - How to create a new column slicing an existing column on the dataframe? 如何遍历数据框,创建新列并在python中为其添加值 - How to loop through a dataframe, create a new column and append values to it in python 如何比较 Python 数据框中的子字符串以创建新列? - How do you compare substrings in a Python dataframe to create a new column? 如何使用 for 循环在 python dataframe 中创建具有多个值的新列? - How to create a new column with multiple values in python dataframe using for loop? 如何在 DataFrame 中删除多列中的内容并在 Python 中创建新列 - How to remove contents in multiple columns and create a new column in a DataFrame In Python 如何根据python中现有列中的条件创建新列? - How to create a new column based on conditions in the existing columns in a dataframe in python? Python Dataframe如何根据条件创建新的列值 - Python Dataframe how to create a new column values based on a condition 如何通过引用其他两列在 Python Dataframe 中创建新列? - How to create a new column in Python Dataframe by referencing two other columns? 移位数据并创建新列-Python数据框 - Shift Data and create new column - python dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM