简体   繁体   English

如何使用另一个较小的熊猫数据框过滤熊猫数据框

[英]How can I filter an pandas dataframe with another Smaller pandas dataframe

I have 2 Data frames the first looks like this我有 2 个数据帧,第一个看起来像这样

df1: df1:

    MONEY    Value
0    EUR      850
1    USD      750
2    CLP        1
3    DCN        1

df2: df2:

      Money
0      USD
1      USD
2      USD
3      USD
4      EGP
...    ...
25984  USD
25985  DCN
25986  USD
25987  CLP
25988  USD

I want to remove the "Money" values of df2 that are not present in df1.我想删除 df1 中不存在的 df2 的“Money”值。 and add any column of the values of the "Value" column in df1并在 df1 中添加“值”列的值的任何列

  Money    Value
0      USD      720
1      USD      720
2      USD      720
3      USD      720
...    ...
25984  USD      720
25985  DCN        1
25986  USD      720
25987  CLP        1
25000  USD      720

Step by step:一步步:

df1.set_index("MONEY")["Value"]

This code transforms the column MONEY into the Dataframe index.此代码将列MONEY转换为Dataframe索引。 Which results in:结果是:

    print(df1)

    MONEY
    EUR    850
    USD    150
    DCN      1

df2["Money"].map(df1.set_index("MONEY")["Value"])

This code maps the content of df2 to df1 .此代码将df2的内容映射到df1 This returns the following:这将返回以下内容:

    0    150.0
    1      NaN
    2    850.0
    3      NaN

  1. Now we assign the previous column to a new column in df2 called Value .现在我们将前一列分配给df2名为Value的新列。 Putting it all together:把它们放在一起:
df2["Value"] = df2["Money"].map(df1.set_index("MONEY")["Value"])

df2 now looks like: df2现在看起来像:

     Money  Value
    0   USD  150.0
    1   GBP    NaN
    2   EUR  850.0
    3   CLP    NaN

  1. Only one thing is left to do: Delete any rows that have NaN value:只剩下一件事要做:删除任何具有NaN值的行:
df2.dropna(inplace=True)

Entire code sample:整个代码示例:

import pandas as pd

# Create df1
x_1 = ["EUR", 850], ["USD", 150], ["DCN", 1]
df1 = pd.DataFrame(x_1, columns=["MONEY", "Value"])

# Create d2
x_2 = "USD", "GBP", "EUR", "CLP"
df2 = pd.DataFrame(x_2, columns=["Money"])

# Create new column in df2 called 'Value'
df2["Value"] = df2["Money"].map(df1.set_index("MONEY")["Value"])
# Drops any rows that have 'NaN' in column 'Value'
df2.dropna(inplace=True)
print(df2)

Outputs:输出:

Money  Value
0   USD  150.0
2   EUR  850.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM