简体   繁体   English

Pandas:将系列添加到数据框作为列(相同的索引,不同的长度)

[英]Pandas: Add series to dataframe as a column (same index, different length)

I have the following dataframe in pandas (the df below is abbreviated): 我在pandas中有以下数据框(下面的df是缩写):

    Index: 23253 entries, 7.0 to 30559.0
    Data columns (total 17 columns):
    Epoch         23190  non-null values
    follow        23253  non-null values
    T_Opp         245    non-null values
    T_Dir         171    non-null values
    Teacher       0      non-null values
    Activity      23253  non-null values
    Actor         23253  non-null values
    Recipient1    14608  non-null values
    dtypes: float64(10), object(7)

Columns like T_Opp and T_Dir have dummy (1/0) data in them. 像T_Opp和T_Dir这样的列中包含虚拟(1/0)数据。 When values in these columns are true, I want to add data from the 'Actor' column to the 'Teacher' column. 当这些列中的值为true时,我想将“Actor”列中的数据添加到“Teacher”列。 So far, I have this (where the "mask" gives the condition under which the data are true. checked this bit and it works): 到目前为止,我有这个(其中“掩码”给出了数据为真的条件。检查此位并且它有效):

    opp_mask = f_acts['Behavior'].str.contains('bp', na=False)
    opp_teacher = f_acts[opp_mask]['Recipient1']

If I were doing this based only on one column, I could simply plug these results into the Teacher column in the dataframe with something like this: 如果我只根据一列进行此操作,我只需将这些结果插入数据框中的Teacher列,如下所示:

    df['Teacher'] = df[opp_mask]['Actor']

But I need to fill the Teacher column with with data from 6 other columns, without overwriting the earlier columns. 但我需要用其他6列的数据填充Teacher列,而不覆盖之前的列。 I have an idea of how this might work, similar to this toy example: 我知道这可能如何工作,类似于这个玩具示例:

    list = [1]*len(df.Teacher)
    df['Teacher'] = list

But I can't seem to figure out how to transform the output of the "mask" technique above to the correct format for this approach--it has the same index info but is shorter than the dataframe I need to add it to. 但我似乎无法弄清楚如何将上面“掩码”技术的输出转换为这种方法的正确格式 - 它具有相同的索引信息,但比我需要添加它的数据帧要短。 What am I missing? 我错过了什么?

UPDATE: Adding the data below to clarify what I'm trying to do. 更新:添加以下数据以阐明我正在尝试做什么。

   follow   T_Opp   T_Dir   T_Enh   T_SocTol    Teacher    Actor    Recipient1
   7        0       1       0       0           NaN        51608    f 
   8        0       0       0       0           NaN        bla      NaN
   11       0       0       0       0           NaN        51601    NaN
   13       1       0       0       1           NaN        f        51602
   18       0       0       0       0           NaN        f        NaN

So for data like these, what I'm trying to do is check the T_ columns one at a time. 所以对于像这样的数据,我要做的是一次检查一个T_列。 If the value in a T_ column is true, fetch the data from the Actor column (if looking at the T_Opp or T_SocTol columns) or from the Recipient column (if looking at T_Enh or T_Dir columns). 如果T_列中的值为true,则从Actor列(如果查看T_Opp或T_SocTol列)或从Recipient列(如果查看T_Enh或T_Dir列)获取数据。 I want to copy that data into the currently empty Teacher column. 我想将该数据复制到当前空的Teacher列中。

More than one of the T_ columns can be true at a time, but in these cases it will always be "grabbing" the same data twice. 不止一个T_列可以一次为真,但在这些情况下,它将始终“抓取”相同的数据两次。 (In other words, I never need data from BOTH the Actor and Recipient columns. Only one or the other, for each row). (换句话说,我从不需要来自Actor和Recipient列的数据。每行只有一个或另一个)。

I want to copy that data into the currently empty Teacher column. 我想将该数据复制到当前空的Teacher列中。

Here's an approach to masking and concatenating multiple columns with Series.where() . 这是使用Series.where()屏蔽和连接多个列的方法。 If the end result is a column of strings, numeric columns will need to be converted to string first with .astype(str) . 如果最终结果是一列字符串,则需要使用.astype(str)将数字列首先转换为字符串。

In [23]: df
Out[23]: 
        C0  Mask1  Mask2 Val1 Val2
0  R_l0_g0      0      0   v1   v2
1  R_l0_g1      1      0   v1   v2
2  R_l0_g2      0      1   v1   v2
3  R_l0_g3      1      1   v1   v2

In [24]: df['Other'] = (df.Val1.astype(str).where(df.Mask1, '') + ',' + 
                        df.Val2.astype(str).where(df.Mask2, '')).str.strip(',')

In [25]: df
Out[25]: 
        C0  Mask1  Mask2 Val1 Val2  Other
0  R_l0_g0      0      0   v1   v2       
1  R_l0_g1      1      0   v1   v2     v1
2  R_l0_g2      0      1   v1   v2     v2
3  R_l0_g3      1      1   v1   v2  v1,v2

And here's another approach using DataFrame.where() . 这是使用DataFrame.where()的另一种方法。 .where , like most pandas operations, performs automatic data alignment. 与大多数pandas操作一样, .where执行自动数据对齐。 Since the column names of the data frame and frame to mask with differ in this case, alignment can be disabled by masking with a raw, un-labeled numpy.ndarray (aka. .values ). 由于在这种情况下数据帧和要屏蔽的帧的列名称不同,因此可以通过使用未标记的原始numpy.ndarray (aka .values )进行屏蔽来禁用对齐。

In [23]: masked = df[['Val1', 'Val2']].\
                     where(df[['Mask1', 'Mask2']].values, '') + ','

In [24]: df['Other2'] = masked.sum(axis=1).str.strip(',')

In [25]: df
Out[25]: 
        C0  Mask1  Mask2 Val1 Val2  Other Other2
0  R_l0_g0      0      0   v1   v2              
1  R_l0_g1      1      0   v1   v2     v1     v1
2  R_l0_g2      0      1   v1   v2     v2     v2
3  R_l0_g3      1      1   v1   v2  v1,v2  v1,v2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将pandas Series作为列添加到多索引的DataFrame填充级别 - Add pandas Series as a column to DataFrame filling levels of multi-index 如何合并格式相同但长度索引不同的两个熊猫数据框 - how to combine two pandas dataframe with same format but different length index 连接两个索引长度不同的 Pandas DataFrame 列 - Concat two Pandas DataFrame column with different length of index Pandas,如何将系列添加到 DataFrame 列,其中系列索引与 DataFrame 列匹配? - Pandas, how to add Series to DataFrame column, where series index matches a DataFrame column? 从系列或字典向数据帧添加一个新列,将我的系列索引和数据帧列映射到键熊猫 python - Add a new column to a dataframe from a Series or dictionary mapping my series index and a dataframe column to key pandas python 熊猫向数据框列添加系列 - Pandas add a series to dataframe column Pandas - 根据与数据框中某个值匹配的系列索引,将系列中的值添加到数据框列 - Pandas - Add values from series to dataframe column based on index of series matching some value in dataframe Python-将numpy数组作为列添加到具有不同长度的pandas数据帧 - Python - add a numpy array as column to a pandas dataframe with different length Pandas 系列索引作为数据框的列名 - Pandas series index as column name of dataframe 用具有相同索引但顺序不同的另一列替换Pandas数据框中的一列 - Replace a column in Pandas dataframe with another that has same index but in a different order
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM