[英]Merging column values in a data frame in Pandas / Python
我正在尝试合并同一数据框中的列(B和C列)的值。 B和C有时具有相同的值。 B中的某些值存在于C中,而C中的某些值存在于B中。最终结果将显示一列,该列是两列的组合。
A B C D
Apple Canada '' RED
Bananas '' Germany BLUE
Carrot US US GREEN
Dorito '' '' INDIGO
A B C
Apple Canada RED
Bananas Germany BLUE
Carrot US GREEN
Dorito '' INDIGO
IIUC
df['B']=df[['B','C']].replace("''",np.nan).bfill(1).loc[:,'B']
df=df.drop('C',1).rename(columns={'D':'C'})
df
Out[102]:
A B C
0 Apple Canada RED
1 Bananas Germany BLUE
2 Carrot US GREEN
3 Dorito NaN INDIGO
您可以对字符串进行排序并采用最后一个字符串:
df['B'] = df[['B', 'C']].apply(lambda x: x.sort_values()[1], axis=1)
df=df.drop('C', 1).rename(columns={'D':'C'})
print(df)
输出:
A B C
0 Apple Canada RED
1 Bananas Germany BLUE
2 Carrot US GREEN
3 Dorito '' INDIGO
另一种方法是巧妙地使用列表理解:
# Make sets of the column B and C combined to get rid of duplicates
k = [set(b.strip() for b in a) for a in zip(df['B'], df['C'])]
# Flatten sets to strings
k = [''.join(x) for x in k]
# Create desired column
df['B'] = k
df.drop('C', axis=1, inplace=True)
print(df)
A B D
0 Apple Canada RED
1 Bananas Germany BLUE
2 Carrot US GREEN
3 Dorito INDIGO
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.