简体   繁体   English

根据另一列和字典值填充熊猫的 dataframe 列

[英]Populate a panda's dataframe column based on another column and dictionary value

I have a data frame that contain a column called DIAGNOSES.我有一个数据框,其中包含一个名为 DIAGNOSES 的列。 This DIAGNOSES column contain a list of 1 or multiple strings, starting with a Character.此 DIAGNOSES 列包含 1 个或多个字符串的列表,以字符开头。

I want to check the first character of every row in DIAGNOSES and grab its first char to look it up from a dictionary to populate DIAGNOSES_TYPE column with these values.我想检查 DIAGNOSES 中每一行的第一个字符并获取它的第一个字符以从字典中查找它以使用这些值填充 DIAGNOSES_TYPE 列。

Minimal Example:最小的例子:

diagnoses = {'A': 'Arbitrary', 'B': 'Brutal', 'C': 'Cluster', 'D': 'Dropped'}

df = pd.DataFrame({'DIAGNOSES': [['A03'], ['A03', 'B23'], ['A30', 'B54', 'D65', 'C60']]})
              DIAGNOSES
0                 [A03]
1            [A03, B23]
2  [A30, B54, D65, C60]

A little visualization to clarify what I want to get, I want to get the df['DIAGNOSES_TYPES'] populated:一点可视化来澄清我想要得到的东西,我想要填充 df['DIAGNOSES_TYPES'] :

诊断具有所需列的数据框

I approached it this way:我是这样处理的:

def map_diagnose(df)
    for col in len(range(df)):
        for d in df['DIAGNOSIS']:
            for diag in d:
                if diag[0] in diagnoses_dict.keys():
                    df['DIAGNOSES_TYPES'] = diagnoses_dict.get(diag[0])
                df['DIAGNOSES_TYPES'] = ''
    return df

use explode , map and groupby :使用explodemapgroupby

diagnoses = {'A': 'Arbitrary', 'B': 'Brutal', 'C': 'Cluster', 'D': 'Dropped'}
df1 = df.explode('DIAGNOSES')
df1['SD'] = df1['DIAGNOSES'].str.extract('(\D)')
df1['DIAGNOSES_TYPES'] = df1['SD'].map(diagnoses)
df1.groupby(level=0).agg(list)

output: output:

    DIAGNOSES                SD             DIAGNOSES_TYPES
0   [A03]                    [A]            [Arbitrary]
1   [A03, B23]               [A, B]         [Arbitrary, Brutal]
2   [A30, B54, D65, C60]     [A, B, D, C]   [Arbitrary, Brutal, Dropped, Cluster]

Column 'SD' there is the first letter of each dagnoses used for mapping;列'SD'是每个用于映射的dagnoses的第一个字母; you can drop this column if not needed如果不需要,您可以drop此列

You can explode "DIAGNOSES" column, get the first elements of each string using str , map diagnoses dictionary to get types, groupby the index and aggregate to a list:您可以展开“诊断”列,使用str获取每个字符串的第一个元素, explode diagnoses字典以获取类型,按索引groupby并聚合到列表:

df['DIAGNOSES_TYPE'] = df['DIAGNOSES'].explode().str[0].map(diagnoses).groupby(level=0).apply(list)

Output: Output:

              DIAGNOSES                         DIAGNOSES_TYPE
0                 [A03]                            [Arbitrary]
1            [A03, B23]                    [Arbitrary, Brutal]
2  [A30, B54, D65, C60]  [Arbitrary, Brutal, Dropped, Cluster]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列值过滤Python中的熊猫dataframe - Filtering panda dataframe in Python based on column value 如何根据另一列值将多个 Panda 的 DataFrame 合并为每个列值的数组 - How to Merge Multiple Panda's DataFrames into an Array for each Column Value Based on Another Column Value 如果另一列的值相同,则添加熊猫 dataframe 列的值 - Add the values of a panda dataframe column if the value of another column is the same 根据另一列中的字典键匹配值和更多条件填充 dataframe 中的新列 - Populate new column in dataframe based on dictionary key matches values in another column and some more conditions 将一个 Panda Dataframe 中的列值替换为具有条件的另一个 Panda Dataframe 中的列 - Replace column value in one Panda Dataframe with column in another Panda Dataframe with conditions Panda 列值更改基于 lamda function 的另一列 - Panda column value change based on another column by lamda function Pandas - 根据另一个填充一个数据框列 - Pandas - populate one dataframe column based on another 替换熊猫中的列值 dataframe - replacing a column value in panda dataframe 根据另一个 dataframe 中列的 if 语句填充 dataframe 中的列 - Python - Populate a column in a dataframe based on if statement for column in another dataframe - Python 根据列值检索熊猫数据框的过去n条记录 - Retrieve past n records of panda dataframe based on column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM