[英]Pandas rename column of dataframe to value of another dataframe if values of two dataframe columns match
I have two dataframes.我有两个数据框。
dfA contains two columns "CCLE_ID" and "Name" amongst other unimportant ones dfA 包含两列“CCLE_ID”和“名称”以及其他不重要的列
dfB contains two columns "CCLE ID" and "Cell line" amongst other unimportant ones. dfB 包含两列“CCLE ID”和“细胞系”以及其他不重要的列。
Right now, dfB['CCLE ID'] values are set to 0.现在,dfB['CCLE ID'] 值设置为 0。
What I want to do is compare all the values in dfA['Name']
column and dfB['Cell line']
column.我想要做的是比较dfA['Name']
列和dfB['Cell line']
列中的所有值。 They are all strings and stand for the shorthand name of cell lines.它们都是字符串,代表细胞系的简写。 If a value for dfA['Name']
and dfB['Cell line']
column matches, then I want to replace the value 0 of dfB['CCLE ID']
column with the string from dfA['CCLE_ID']
column of that matched cell name.如果dfA['Name']
和dfB['Cell line']
列的值匹配,那么我想用 dfA['CCLE_ID'] 列中的字符串替换dfA['CCLE_ID']
dfB['CCLE ID']
列的值 0匹配的单元格名称。
I am honestly so lost as to how to do this (pandas beginner).老实说,我对如何做到这一点感到迷茫(熊猫初学者)。
First we presume dfA and dfB have the same number of rows because if they don't, then it's more complicated and you have two choices: either reshape the dataFrames to have the same number of rows, or use other Python libraries to perform the transformation.首先,我们假设 dfA 和 dfB 具有相同的行数,因为如果它们不具有相同的行数,那么它会更复杂,您有两种选择:要么重塑数据帧以具有相同的行数,要么使用其他 Python 库来执行转换.
Based on this initial presumption that the data Frames have the same number of rows, I'm going to try and break this down for you step by step.基于数据帧具有相同行数的初始假设,我将尝试逐步为您分解。
With the two dataframes, dfA
and dfB
, start by merging the data.使用两个数据dfA
和dfB
,首先合并数据。 You can remove the extra columns from dfB later.您可以稍后从 dfB 中删除额外的列。
To merge the dfA columns into dfB for simplicity, add two columns dfaName and dfa_CCLE_ID.为简单起见,要将 dfA 列合并到 dfB 中,请添加两列 dfaName 和 dfa_CCLE_ID。
dfB['dfaName'] = dfa['Name']
dfB['dfa_CCLE_ID'] = dfa['CCLE_ID']
Then use pandas.dataFrame.apply() to conditionnally transform your data.然后使用pandas.dataFrame.apply()有条件地转换您的数据。
dfB['CCLE_ID'] = dfB[['dfaName','Cell line', 'dfa_CCLE_ID']].apply(lambda x: x['dfa_CCLE_ID'] if x['dfaName']==x['Cell line'] else x, axis=1)
A nice extra could be to use a dataframe mask to generate and see comparison.一个不错的附加功能是使用dataframe 掩码生成并查看比较。 It is a good step to take to view and test your data transformation.这是查看和测试数据转换的好步骤。 In this example, create an extra column in dfB with true/false values for the comparison.在此示例中,在 dfB 中创建一个额外的列,其中包含用于比较的真/假值。
dfB['column_matcher'] = dfb['dfaName']==dfB['Cell line']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.