简体   繁体   English

遍历两个数据帧,并用 pandas 中的第二个数据帧的一列更新第一个数据帧的一列

[英]Iterate through two data frames and update a column of the first data frame with a column of the second data frame in pandas

I am converting a piece of code written in R to python.我正在将一段用 R 编写的代码转换为 python。 The following code is in R.以下代码在 R 中。 df1 and df2 are the dataframes. df1df2是数据帧。 id , case , feature , feature_value are column names. idcasefeaturefeature_value是列名。 The code in R is R 中的代码是

for(i in 1:dim(df1)[1]){
 temp = subset(df2,df2$id == df1$case[i],select = df1$feature[i])
 df1$feature_value[i] = temp[,df1$feature[i]]
 }

My code in python is as follows.我在 python 中的代码如下。

for i in range(0,len(df1)):
   temp=np.where(df1['case'].iloc[i]==df2['id']),df1['feature'].iloc[i]                                  
   df1['feature_value'].iloc[i]=temp[:,df1['feature'].iloc[i]]

but it gives但它给了

TypeError: tuple indices must be integers or slices, not tuple

How to rectify this error?如何纠正这个错误? Appreciate any help.感谢任何帮助。

Unfortunately, R and Pandas handle dataframes pretty differently.不幸的是,R 和 Pandas 处理数据帧的方式截然不同。 If you'll be using Pandas a lot, it would probably be worth going through a tutorial on it.如果您将经常使用 Pandas,则可能值得阅读有关它的教程。

I'm not too familiar with R so this is what I think you want to do: Find rows in df1 where the 'case' matches an 'id' in df2.我对 R 不太熟悉,所以这就是我认为您想要做的:在 df1 中查找“case”与 df2 中的“id”匹配的行。 If such a row is found, add the "feature" in df1 to a new df1 column called "feature_value."如果找到这样的行,请将 df1 中的“feature”添加到名为“feature_value”的新 df1 列中。 If so, you can do this with the following:如果是这样,您可以使用以下方法执行此操作:

#create a sample df1 and df2
>>> df1 = pd.DataFrame({'case': [1, 2, 3], 'feature': [3, 4, 5]})
>>> df1
   case  feature
0     1        3
1     2        4
2     3        5

>>> df2 = pd.DataFrame({'id': [1, 3, 7], 'age': [45, 63, 39]})
>>> df2
   id  age
0   1   45
1   3   63
2   7   39

#create a list with all the "id" values of df2
>>> df2_list = df2['id'].to_list()
>>> df2_list
[1, 3, 7]


#lambda allows small functions; in this case, the value of df1['feature_value']
#for each row is assigned df1['feature'] if df1['case'] is in df2_list, 
#and otherwise it is assigned np.nan.

>>> df1['feature_value'] = df1.apply(lambda x: x['feature'] if x['case'] in df2_list else np.nan, axis=1)
>>> df1
   case  feature  feature_value
0     1        3            3.0
1     2        4            NaN
2     3        5            5.0

Instead of lamda, a full function can be created, which may be easier to understand:代替lamda,可以创建一个完整的function,可能更容易理解:

def get_feature_values(df, id_list):
    
    if df['case'] in id_list:
        feature_value = df['feature']
    else:
        feature_value = np.nan
        
    return feature_value

df1['feature_value'] = df1.apply(get_feature_values, id_list=df2_list, axis=1)

Another way of going about this would involve merging df1 and df2 to find rows where the "case" value in df1 matches an "id" value in df2 ( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html )另一种解决方法是合并 df1 和 df2 以查找 df1 中的“case”值与 df2 中的“id”值匹配的行( https://pandas.pydata.org/pandas-docs/stable/reference/ api/pandas.DataFrame.merge.html )

=================== ====================

To address the follow-up question in the comments: You can do this by merging the databases and then creating a function.要解决评论中的后续问题:您可以通过合并数据库然后创建 function 来做到这一点。

#create example dataframes
>>> df1 = pd.DataFrame({'case': [1, 2, 3], 'feature': [3, 4, 5], 'names': ['a', 'b', 'c']})
>>> df2 = pd.DataFrame({'id': [1, 3, 7], 'age': [45, 63, 39], 'a': [30, 31, 32], 'b': [40, 41, 42], 'c': [50, 51, 52]})

#merge the dataframes
>>> df1 = df1.merge(df2, how='left', left_on='case', right_on='id')
>>> df1
   case  feature names   id   age     a     b     c
0     1        3     a  1.0  45.0  30.0  40.0  50.0
1     2        4     b  NaN   NaN   NaN   NaN   NaN
2     3        5     c  3.0  63.0  31.0  41.0  51.0

Then you can create the following function:然后可以创建如下 function:

def get_feature_values_2(df):
    
    if pd.notnull(df['id']):
        feature_value = df['feature']
        column_of_interest = df['names']
        feature_extended_value = df[column_of_interest]
    else:
        feature_value = np.nan
        feature_extended_value = np.nan
        
    return feature_value, feature_extended_value


# "result_type='expand'" allows multiple values to be returned from the function
df1[['feature_value', 'feature_extended_value']] = df1.apply(get_feature_values_2, result_type='expand', axis=1)


#This results in the following dataframe:
   case  feature names   id   age     a     b     c  feature_value  \
0     1        3     a  1.0  45.0  30.0  40.0  50.0            3.0   
1     2        4     b  NaN   NaN   NaN   NaN   NaN            NaN   
2     3        5     c  3.0  63.0  31.0  41.0  51.0            5.0   

   feature_extended_value  
0                    30.0  
1                     NaN  
2                    51.0


#To keep only a subset of the columns:
#First create a copy-pasteable list of the column names
list(df1.columns)
['case', 'feature', 'names', 'id', 'age', 'a', 'b', 'c', 'feature_value', 'feature_extended_value']

#Choose the subset of columns you would like to keep
df1 = df1[['case', 'feature', 'names', 'feature_value', 'feature_extended_value']]

df1
   case  feature names  feature_value  feature_extended_value
0     1        3     a            3.0                    30.0
1     2        4     b            NaN                     NaN
2     3        5     c            5.0                    51.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较具有不同列名的两个数据框,并使用来自第二个数据框的列更新第一个数据框 - Compare two data-frames with different column names and update first data-frame with the column from second data-frame 通过将第一个数据帧的一列与第二个数据帧的两列匹配来合并两个数据帧 - Merge two data frames by matching one column from the first data frame with two columns from the second data frame 为第一个数据帧的每一列计算两个数据帧的差异 - Calculating difference of two data frames for each column of first data frame 如何遍历两个数据帧中的数据并保留第一个数据帧的索引? - How can I iterate through data in two data frames and keep the index of my first data frame? 如何基于一个数据框中的一列和第二个数据框中的两列合并两个数据框 - How to merge two data frames based on one column in one data frame and two column in second dataframe 读入熊猫数据框(不包括第一列) - reading into pandas data frame excluding the first column 如何访问第一列 pandas 数据框 - how to access first column pandas data frame 更改 pandas 数据帧中的第一列 - Changing the first column in a pandas data frame Python Pandas:比较一列中的两个数据帧,并返回另一个数据帧中两个数据帧的行内容 - Python Pandas : compare two data-frames along one column and return content of rows of both data frames in another data frame 检查 Pandas 数据框列中的唯一值并与第二列交叉引用 - Checking for unique values in Pandas Data frame column and crossreferenceing with a second column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM