[英]How to create new column dynamically in dataframe in python
df1 = pd.DataFrame(
{
"empid" : [1,2,3,4,5,6],
"empname" : ['a', 'b','c','d','e','f'],
"empcity" : ['aa','bb','cc','dd','ee','ff']
})
df1
df2 = pd.DataFrame(
{
"empid" : [1,2,3,4,5,6],
"empname" : ['a', 'b','m','d','n','f'],
"empcity" : ['aa','bb','cc','ddd','ee','fff']
})
df2
df_all = pd.concat([df1.set_index('empid'),df2.set_index('empid')],axis='columns',keys=['first','second'])
df_all
df_final = df_all.swaplevel(axis = 'columns')[df1.columns[1:]]
df_final
Based on df_final data frame, need to create following output.基于df_final数据帧,需要创建如下output。 here comparison column need to created dynamically for every identical column as i'm trying to compare two data frame(both data frame structure and column name are same) where number of columns are more than 300这里需要为每个相同的列动态创建比较列,因为我正在尝试比较两个数据框(数据框结构和列名都相同),其中列数超过 300
Use DataFrame.stack
for possible compare all levels columns first
with second
, create new column in DataFrame.assign
and reshape back by DataFrame.unstack
with DataFrame.swaplevel
and DataFrame.reindex
for original order: Use DataFrame.stack
for possible compare all levels columns first
with second
, create new column in DataFrame.assign
and reshape back by DataFrame.unstack
with DataFrame.swaplevel
and DataFrame.reindex
for original order:
#original ordering
orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']
df_final = (df_all.stack()
.assign(comparions=lambda x: x['first'].eq(x['second']))
.unstack()
.swaplevel(axis = 'columns')
.reindex(orig, axis=1, level=0))
print (df_final)
empname empcity
first second comparions first second comparions
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
(i) Use get_level_values
to get the label values for level 0 (i) 使用get_level_values
获取级别 0 的 label 值
(ii) Iterate over the outcome of (i) and for each level=0
, do element-wise comparison using eq
between first
and second
(ii) 迭代 (i) 的结果,对于每个level=0
,使用first
和second
之间的eq
进行元素比较
(iii) use sort_index
to sort columns in desired order (iii) 使用sort_index
按所需顺序对列进行排序
for level_0 in df_final.columns.get_level_values(0).unique():
df_final[(level_0, 'comparison')] = df_final[(level_0, 'first')].eq(df_final[(level_0,'second')])
df_final = df_final.sort_index(level=0, sort_remaining=False, axis=1)
Output: Output:
empcity empname
first second comparison first second comparison
empid
1 aa aa True a a True
2 bb bb True b b True
3 cc cc True c m False
4 dd ddd False d d True
5 ee ee True e n False
6 ff fff False f f True
==
直接将 2 个数据帧与==
进行比较You can do this with a simple ==
between two dataframes that you need to compare.您可以在需要比较的两个数据帧之间使用简单的==
来做到这一点。 Let's start with the original 2 dataframes df1
and df2
-让我们从原始的 2 个数据帧df1
和df2
开始 -
first = df1.set_index('empid')
second = df2.set_index('empid')
comparisons = first==second #<---
output = pd.concat([first, second, comparisons], axis=1,keys=['first','second', 'comparisons'])
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
output = output.swaplevel(axis=1).reindex(first.columns, axis=1, level=0)
print(output)
empname empcity
first second comparisons first second comparisons
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
In addition to the excellent answer by jezrael , I am adding an alternate way of doing this using pandas groupby.除了 jezrael 的出色回答之外,我还使用 pandas groupby 添加了另一种方法。
#create comparisons
comparisons = (df_all.T
.groupby(level=-1)
.apply(lambda x: x.iloc[0]==x.iloc[1])
.T)
#add multi index columns
comparisons.columns = pd.MultiIndex.from_product([['comparison'],comparisons.columns])
#concatenate with original data
df_final = pd.concat([df_all, comparisons], axis='columns')
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
df_final = (df_final.swaplevel(axis = 'columns')
.reindex(df1.set_index('empid')
.columns, axis=1, level=0))
print(df_final)
empname empcity
first second comparison first second comparison
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.