简体   繁体   English

将值添加到基于另一个 dataframe 的 pandas dataframe 列

[英]adding values to pandas dataframe columns based on another dataframe

I have a dataframe that looks like this(df):我有一个 dataframe 看起来像这样(df):

HOUSEID    PERSONID      WHY_TRP
20000017      1            1
20000017      1            1
20000017      1            1
20000017      2            1
20000017      2            3
20000231      1            11
20000231      1            11
20000231      2            11
20000521      1            11
20000521      2            11
20000521      2            3

Each row describes a trip made by a person.每行描述一个人的一次旅行。 I have another dataframe of the same kind in which each row describes a person(df_p):我还有另一个 dataframe ,其中每一行都描述了一个人(df_p):

    HOUSEID   PERSONID   
    20000017      1      
    20000017      2     
    20000231      1    
    20000231      2    
    20000521      1    
    20000521      2 

I want to make three new columns in the second dataframe to show the frequency of 1, 3 and 11 for each person.我想在第二个 dataframe 中创建三个新列,以显示每个人的频率 1、3 和 11。 basically I already have a second dataframe (df_p) with other features so I shouldn't use groupby.基本上我已经有了第二个 dataframe (df_p) 和其他功能,所以我不应该使用 groupby。 for some reason the first and second dataframe don't have the same number of people.由于某种原因,第一个和第二个 dataframe 的人数不一样。 that's why I needed the strategy below.这就是为什么我需要下面的策略。 This is the code I tried but took hours to complete(1 million iterations):这是我尝试过的代码,但需要数小时才能完成(100 万次迭代):

df_p.insert(2, 'WHY_TRP_1', 0)
df_p.insert(3, 'WHY_TRP_2', 0)
df_p.insert(4, 'WHY_TRP_3', 0)

def trip_counter(i, r):
  if r[2] == 1:
    df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] == r[1]), ['WHY_TRP_1']] += 1 
  elif r[2] == 3:
    df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] ==  r[1]), ['WHY_TRP_3']] += 1 
  elif r[2] == 11:
    df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] ==  r[1]), ['WHY_TRP_11']] += 1


for i ,r in df.iterrows():
  trip_counter(i ,r) 

output: output:

     HOUSEID   PERSONID   WHY_TRP_1     WHY_TRP_3      WHY_TRP_11
    20000017      1            3            0            0
    20000017      2            1            1            0
    20000231      1            0            0            2
    20000231      2            0            0            1
    20000521      1            0            0            1
    20000521      2            0            1            1          

Is there a faster way to do this?有没有更快的方法来做到这一点?

thank you谢谢你

You can get a table of the counts by doing a groupby on the first dataframe and unstacking WHY_TRP , and then you can just merge it to the second:您可以通过在第一个 dataframe 上执行groupbyWHY_TRP来获得计数表,然后您可以将其合并到第二个:

counts = df.groupby(["HOUSEID", "PERSONID", "WHY_TRP"]).apply(len).unstack(fill_value=0)

counts.columns = counts.columns.map(lambda x: f"WHY_TRP_{x}")

counts

WHY_TRP            WHY_TRP_1  WHY_TRP_3  WHY_TRP_11
HOUSEID  PERSONID
20000017 1                 3          0           0
         2                 1          1           0
20000231 1                 0          0           2
         2                 0          0           1
20000521 1                 0          0           1
         2                 0          1           1

df2.merge(counts, how="left", left_on=["HOUSEID", "PERSONID"], right_index=True)

    HOUSEID  PERSONID  WHY_TRP_1  WHY_TRP_3  WHY_TRP_11
0  20000017         1          3          0           0
1  20000017         2          1          1           0
2  20000231         1          0          0           2
3  20000231         2          0          0           1
4  20000521         1          0          0           1
5  20000521         2          0          1           1

You could also do a pivot_table and then merge :你也可以做一个pivot_table然后merge

m = df.pivot_table(index=['HOUSEID','PERSONID'],
                   columns='WHY_TRP',aggfunc=len,fill_value=0)

out= df_p.merge(m.add_prefix('WHY_TRP'),left_on=['HOUSEID','PERSONID'],right_index=True)

print(out)

    HOUSEID  PERSONID  WHY_TRP1  WHY_TRP3  WHY_TRP11
0  20000017         1         3         0          0
1  20000017         2         1         1          0
2  20000231         1         0         0          2
3  20000231         2         0         0          1
4  20000521         1         0         0          1
5  20000521         2         0         1          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一列的值将列添加到pandas数据框中 - Adding columns to a pandas dataframe based on values of another column 根据另一个数据帧将列添加到 Pandas 数据帧并将值设置为零 - Add columns to Pandas dataframe based on another dataframe and set values to zero 根据来自另一个数据框的数据将值分配给Pandas数据框中的列 - Assign values to columns in Pandas Dataframe based on data from another dataframe 子集根据另一个数据帧的值在多个列上进行pandas数据帧 - Subset pandas dataframe on multiple columns based on values from another dataframe 根据另一个数据帧的列值的条件将数据添加到数据帧中的列 - Adding data to columns in a dataframe based on condition on column values of another dataframe 熊猫-根据其他数据框列中的值删除列 - Pandas - Remove Columns based on values in another dataframe columns 如何根据另一列的值更改 Pandas DataFrame 中的值 - How to change values in a Pandas DataFrame based on values of another columns 根据来自另一个 DataFrame 的值更新 pandas 列中的值 - Update values in pandas columns based on values from another DataFrame 根据某些列向 pandas dataframe 添加标题/另一行 - Adding a header/another row to a pandas dataframe based on some columns 在熊猫数据框中基于1或更多列添加/插入值 - Adding/inserting values in pandas dataframe based on 1 or more columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM