简体   繁体   中英

Create a new column and assign values to the first row by each group in Python Pandas

I have an existing pandas dataframe below:

   id  time  c  d
0   1     1  2  3
1   1     3  1  6
2   2     2  3  2
3   2     3  8  6

I also have values stored in a list such as:

list = [0.4, 0.6]

I want to create a new column in the existing dateframe and assign each list element in the first row for each group (id) such as:

   id  time  c  d  new_col
0   1     1  2  3  0.4
1   1     3  1  6
2   2     2  3  2  0.6
3   2     3  8  6

Use Series.map with dictionary created by enumerate and set values only first per groups in Series.mask with Series.duplicated :

L=[0.4, 0.6]

df["new_col"] = df['id'].sub(1).map(dict(enumerate(L))).mask(df['id'].duplicated())

print (df)
   id  time  c  d  new_col
0   1     1  2  3      0.4
1   1     3  1  6      NaN
2   2     2  3  2      0.6
3   2     3  8  6      NaN

L=[0.4, 0.6]

df["new_col"] = df['id'].sub(1).map(dict(enumerate(L))).mask(df['id'].duplicated(),'')

print (df)
   id  time  c  d new_col
0   1     1  2  3     0.4
1   1     3  1  6        
2   2     2  3  2     0.6
3   2     3  8  6        

If possible any groups in id , eg 10, 20 use GroupBy.ngroup :

L=[0.4, 0.6]

df["new_col"] = (df.groupby('id').ngroup().map(dict(enumerate(L)))
                   .mask(df['id'].duplicated(),''))

print (df)
   id  time  c  d new_col
0  10     1  2  3     0.4
1  10     3  1  6        
2  20     2  3  2     0.6
3  20     3  8  6        

Group the dataframe by id and use cumcount to create a sequential counter, then use boolean indexing with loc to assign the list values where the value of the counter is 0

lst = [0.4, 0.6]
df.loc[df.groupby('id').cumcount().eq(0), 'new_col'] = lst

   id  time  c  d  new_col
0   1     1  2  3      0.4
1   1     3  1  6      NaN
2   2     2  3  2      0.6
3   2     3  8  6      NaN

Does this help solve your problem? (I can't see any benefit to only putting that value in the first occurrence of the group and not the other values)

mapping = {1: 0.4, 2: 0.6}

df["new_col"] = df['id'].map(mapping)

Result:

在此处输入图像描述

Note: If all of your ids are sequential integers starting at 1 and all of the values in your list are also in that correct order you could convert it to a mapping dict using:

mapping = {i+1: value for i, value in enumerate(your_list)}

One way using pandas.DataFrame.groupby.ngroup with numpy.take :

l = [0.4, 0.6]
gid = df.groupby("id").ngroup()
df["new_col"] = np.where(gid.duplicated(), "",  np.take(l, gid))

Output:

   id  time  c  d new_col
0   1     1  2  3     0.4
1   1     3  1  6        
2   2     2  3  2     0.6
3   2     3  8  6        

Explain:

The difficulty here is to apply only to the first row of each group.

Applying duplicated to the taken value is dangerous since it will remove values in another group should the values happened to duplicate (eg if l == [0.4, 0.6, 0.4] ).

So duplicated is applied to gid to assert we extract the first row of each group.

  • df.groupby("id").ngroup() : enumerates the group to create indices. Note that this is invulnerable to the type of df["id"] , such as ["a", "a", "b", "b"] .

    In case df["id"] is always numeric and starts from 1 , you can replace this with

    gid = df["id"] - 1
  • np.take(l, gid) : With the generated indices, take the list elements.

  • np.where(gid.duplicated, ...) : To ensure the taken values from l is only applied to the first rows of each group (aka non-duplicates), and the rest gets filled with "" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM