简体   繁体   English

如何使用条件将一个数据框列的值与另一个数据框列的值匹配?

[英]How do you match the value of one dataframe's column with another dataframe's column using conditionals?

I have two dataframes:我有两个数据框:

Row No.    Subject    
1      Apple
2      Banana
3      Orange
4      Lemon
5      Strawberry


row_number Subjects Special?
1    Banana      Yes
2    Lemon       No
3    Apple       No
4    Orange      No
5    Strawberry  Yes
6    Cranberry   Yes
7    Watermelon  No

I want to change the Row No. of the first dataframe to match the second.我想更改第一个 dataframe 的行号以匹配第二个。 It should be like this:它应该是这样的:

Row No.    Subject   
3      Apple
1      Banana
4      Orange
2      Lemon
5      Strawberry

I have tried this code:我试过这段代码:

for index, row in df1.iterrows():
    if df1['Subject'] == df2['Subjects']:
        df1['Row No.'] = df2['row_number']

But I get the error:但我得到了错误:

ValueError: Can only compare identically-labeled Series objects

Does that mean the dataframes have to have the same amount of rows and columns?这是否意味着数据框必须具有相同数量的行和列? Do they have to be labelled the same too?它们也必须贴上相同的标签吗? Is there a way to bypass this limitation?有没有办法绕过这个限制?

Edit: I have found a promising alternative formula:编辑:我找到了一个有前途的替代公式:

for x in df1['Subject']:
    if x in df2['Subjects'].values:
        df2.loc[df2['Subjects'] == x]['row_number'] = df1.loc[df1['Subject'] == x]['Row No.']

But it appears it doesn't modify the first dataframe like I want it to.但它似乎并没有像我想要的那样修改第一个 dataframe。 Any tips why?任何提示为什么? Furthermore, I get this warning:此外,我收到此警告:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I would avoid using for loops especially when pandas has such great methods to handle these types of problems already.我会避免使用for循环,特别是当pandas已经有很好的方法来处理这些类型的问题时。

Using pd.Series.replace使用 pd.Series.replace

Here is a vectorized way of doing this -这是执行此操作的矢量化方式 -

  1. d is the dictionary that maps the fruit to the number in second dataframe d是将水果映射到第二个 dataframe 中的数字的字典
  2. You can use df.Subject.replace(d) to now simply replace the keys in the dict d to their values.您现在可以使用df.Subject.replace(d)简单地将 dict d中的键替换为它们的值。
  3. Overwrite the Row No. column with this now.现在用这个覆盖Row No.列。
d = dict(zip(df2['Subjects'], df2['row_number']))
df1['Row No.'] = df1.Subject.replace(d)
print(df1)
      Subject  Row No.
0       Apple        3
1      Banana        1
2      Orange        4
3       Lemon        2
4  Strawberry        5

Using pd.merge使用 pd.merge

Let's try simply merging the 2 dataframe and replace the column completely.让我们尝试简单地合并 2 dataframe 并完全更换色谱柱。

ddf = pd.merge(df1['Subject'], 
               df2[['row_number','Subjects']], 
               left_on='Subject', 
               right_on='Subjects', 
               how='left').drop('Subjects',1)

ddf.columns = df1.columns[::-1]
print(ddf)
      Subject  Row No.
0       Apple       3
1      Banana       1
2      Orange       4
3       Lemon       2
4  Strawberry       5

Assuming the first is df1 and the second is df2, this should do what you want it to:假设第一个是 df1,第二个是 df2,这应该做你想要的:

import pandas as pd

d1 = {'Row No.': [1, 2, 3, 4, 5], 'Subject': ['Apple', 'Banana', 'Orange', 
     'Lemon', 'Strawberry']}
df1 = pd.DataFrame(data=d1)

d2 = {'row_number': [1, 2, 3, 4, 5, 6, 7], 'Subjects': ['Banana', 'Lemon', 'Apple', 
'Orange', 'Strawberry', 'Cranberry', 'Watermelon'], 'Special?': ['Yes', 'No', 
     'No', 'No', 
     'Yes', 'Yes', 'No']}
df2 = pd.DataFrame(data=d2)

for x in df1['Subject']:
    if x in df2['Subjects'].values:
        df1.loc[df1['Subject'] == x, 'Row No.'] = (df2.loc[df2['Subjects'] == x]['row_number']).item()

#print(df1)
#print(df2)

In your edited answer it looks like you had the dataframes swapped and you were missing the item() to get the actual row_number value and not the Series object.在您编辑的答案中,您似乎已经交换了数据框,并且您缺少 item() 来获取实际的 row_number 值,而不是 object 系列。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将一列除以另一列,其中一个数据帧的列值对应于 Python Pandas 中另一个数据帧的列值? - How to divide one column by another where one dataframe's column value corresponds to another dataframe's column's value in Python Pandas? 如何根据另一行更新数据框中的列值? - How do you update a column's value in a dataframe based off of another row? 如何将数据框的值复制到另一个数据框的最后一列/行 - How to copy value of dataframe to another dataframe's last column/row 如何根据另一个 dataframe 列的值设置列中的值 - How set value in column according to another dataframe column's value 您如何根据另一个 dataframe 中列的值以及该 Z6A8064B5DF479455500553C47C5505234067B 中的列字符串是否为 ZE8064B5DF47C55057DZ 过滤 dataframe? - How do you filter dataframe based off value of column in another dataframe and whether the string of a column in that dataframe is a substring? 如何根据另一个数据框的值返回列中的值 - How to return a value in a column based on another's dataframe's values 如何使用数据框的数据创建聚合列,然后使用 pyspark 中的另一个 dataframe 扩展行? - How do I use a dataframe's data in creating a aggregated column then expanding rows using another dataframe in pyspark? 用另一列替换数据框值的一列 - Replace one column of a dataframe's values with another 如何使用 group by 检查数据框列中的值是否在另一个数据框列中 - How to check if a value in a dataframe's column is in another dataframe's column with group by Match one column's dataframe to another dataframe with a series of columns and extracting the columns header - Python - Match one column's dataframe to another dataframe with a series of columns and extracting the columns header - Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM