简体   繁体   English

使用一个数据框(用作字典)填充主数据框(Python、Pandas)

[英]Use one data-frame (used as a dictionary) to fill in the main data-frame (Python, Pandas)

I have a central DataFrame called "cases" (5000000 rows × 5 columns) and a secondary DataFrame, called "relevant information", which is a kind of dictionary in relation to the central DataFrame (300 rows × 6 columns).我有一个名为“cases”(5000000 行 × 5 列)的中央 DataFrame 和一个名为“相关信息”的辅助 DataFrame,它是一种与中央 DataFrame(300 行 × 6 列)相关的字典。 I am trying to fill in the central DataFrame based on a common column called "Verdict_type".我正在尝试根据名为“Verdict_type”的公共列填充中央 DataFrame。 And, if the value does not appear in the secondary DataFrame it fill in "not_relevant" in all the rows that will be added.并且,如果该值没有出现在辅助 DataFrame 中,它会在将添加的所有行中填写“not_relevant”。 I used all sorts of directions without success.我用了各种方向都没有成功。 I would love to get a good direction.我很想得到一个好的方向。

The DataFrames数据帧

import pandas as pd

# this is a mockup of the raw data
cases = [
    [1, "1", "v1"],
    [2, "2", "v2"],
    [3, "3", "v3"]
]

relevant_info = [
    ["v1", "info1"],
    ["v3", "info3"]
]

# these are the data from screenshot
df_cases = pd.DataFrame(cases, columns=["id", "verdict_name", "verdict_type"]).set_index("id")
df_relevant_info = pd.DataFrame(relevant_info, columns=["verdict_type", "features"])

Input:输入:
df_cases <-- note here the index marked as 'id' df_cases <-- 注意这里标记为“id”的索引
df_relevant_info df_relevant_info

# first, flatten the index of the cases ( this is probably what you were missing )
df_cases = df_cases.reset_index()
# then, merge the two sets on the verdict_type
df_merge = pd.merge(df_cases, df_relevant_info, on="verdict_type", how="outer")
# finally, mark missing values as non relevant
df_merge["features"] = df_merge["features"].fillna(value="not_relevant")

Output:输出:

merged set:
+----+------+----------------+----------------+--------------+
|    |   id |   verdict_name | verdict_type   | features     |
|----+------+----------------+----------------+--------------|
|  0 |    1 |              1 | v1             | info1        |
|  1 |    2 |              2 | v2             | not_relevant |
|  2 |    3 |              3 | v3             | info3        |
+----+------+----------------+----------------+--------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM