简体   繁体   English

如何根据 Python 中的非空列的字典创建 dataframe 列

[英]How to create dataframe columns based on dictionaries for non-null columns in Python

I have a data frame and a dictionary like this:我有一个数据框和一个像这样的字典:

df:
ID   Science  Social 
1      12       24   
2      NaN      13   
3      26       NaN  
4      23       35   

count_dict = {Science:30, Social: 40}

For every course column in the data frame, I want to create 2 new columns such that:对于数据框中的每个课程列,我想创建 2 个新列,以便:

Col-1(Course_Count): If the course column is not null, then the new column gets the value from the dictionary, else it will remain Null. Col-1(Course_Count):如果课程列不是 null,则新列从字典中获取值,否则将保持 Null。

Col-2(Course_%): Course/Course_Count Col-2(Course_%):Course/Course_Count

The output looks like this: output 看起来像这样:

df:
ID   Science Science_Count Science_% Social Social_Count Social_%
1      12         30          12/30    24        40        24/40    
2      NaN                             13        40        13/40
3      26         30          26/30    NaN               
4      23         30          23/30    35        40        35/40

Can anyone help me with this?谁能帮我这个?

If not any column in your dataframe is a course column, you can specify only the course column names in the courses list.如果您的 dataframe 中的任何列都不是课程列,则您只能在courses列表中指定课程列名称。 Now I am just skipping the first column there ('ID'):现在我只是跳过那里的第一列('ID'):

courses = df.columns[1:]


order = ['ID'] + [col for course in courses for col in (course, course+'_Count', course+'_%')]

for course in courses:
    df[course + '_Count'] = count_dict[course]
    df.loc[df[course].isna(), course + '_Count'] = np.nan
    df[course + '_%'] = df[course] / df[course + '_Count']

df = df[order]  # reorder the columns

Result:结果:

   ID  Science  Science_Count  Science_%  Social  Social_Count  Social_%
0   1     12.0           30.0   0.400000    24.0          40.0     0.600
1   2      NaN            NaN        NaN    13.0          40.0     0.325
2   3     26.0           30.0   0.866667     NaN           NaN       NaN
3   4     23.0           30.0   0.766667    35.0          40.0     0.875

try this:尝试这个:

column_name=list(df.columns)
for column in column_name:
  df[f"{column}_Count"]=df.apply(lambda x:count_dict[column] if x==None else None,axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM