I have an existing pandas dataframe below:
id time c d
0 1 1 2 3
1 1 3 1 6
2 2 2 3 2
3 2 3 8 6
I also have values stored in a list such as:
list = [0.4, 0.6]
I want to create a new column in the existing dateframe and assign each list element in the first row for each group (id) such as:
id time c d new_col
0 1 1 2 3 0.4
1 1 3 1 6
2 2 2 3 2 0.6
3 2 3 8 6
Use Series.map
with dictionary created by enumerate
and set values only first per groups in Series.mask
with Series.duplicated
:
L=[0.4, 0.6]
df["new_col"] = df['id'].sub(1).map(dict(enumerate(L))).mask(df['id'].duplicated())
print (df)
id time c d new_col
0 1 1 2 3 0.4
1 1 3 1 6 NaN
2 2 2 3 2 0.6
3 2 3 8 6 NaN
L=[0.4, 0.6]
df["new_col"] = df['id'].sub(1).map(dict(enumerate(L))).mask(df['id'].duplicated(),'')
print (df)
id time c d new_col
0 1 1 2 3 0.4
1 1 3 1 6
2 2 2 3 2 0.6
3 2 3 8 6
If possible any groups in id
, eg 10, 20
use GroupBy.ngroup
:
L=[0.4, 0.6]
df["new_col"] = (df.groupby('id').ngroup().map(dict(enumerate(L)))
.mask(df['id'].duplicated(),''))
print (df)
id time c d new_col
0 10 1 2 3 0.4
1 10 3 1 6
2 20 2 3 2 0.6
3 20 3 8 6
Group the dataframe by id
and use cumcount
to create a sequential counter, then use boolean indexing with loc
to assign the list values where the value of the counter is 0
lst = [0.4, 0.6]
df.loc[df.groupby('id').cumcount().eq(0), 'new_col'] = lst
id time c d new_col
0 1 1 2 3 0.4
1 1 3 1 6 NaN
2 2 2 3 2 0.6
3 2 3 8 6 NaN
Does this help solve your problem? (I can't see any benefit to only putting that value in the first occurrence of the group and not the other values)
mapping = {1: 0.4, 2: 0.6}
df["new_col"] = df['id'].map(mapping)
Result:
Note: If all of your ids are sequential integers starting at 1 and all of the values in your list are also in that correct order you could convert it to a mapping dict using:
mapping = {i+1: value for i, value in enumerate(your_list)}
One way using pandas.DataFrame.groupby.ngroup
with numpy.take
:
l = [0.4, 0.6]
gid = df.groupby("id").ngroup()
df["new_col"] = np.where(gid.duplicated(), "", np.take(l, gid))
Output:
id time c d new_col
0 1 1 2 3 0.4
1 1 3 1 6
2 2 2 3 2 0.6
3 2 3 8 6
Explain:
The difficulty here is to apply only to the first row of each group.
Applying duplicated
to the taken value is dangerous since it will remove values in another group should the values happened to duplicate (eg if l == [0.4, 0.6, 0.4]
).
So duplicated
is applied to gid
to assert we extract the first row of each group.
df.groupby("id").ngroup()
: enumerates the group to create indices. Note that this is invulnerable to the type of df["id"]
, such as ["a", "a", "b", "b"]
.
In case df["id"]
is always numeric and starts from 1
, you can replace this with
gid = df["id"] - 1
np.take(l, gid)
: With the generated indices, take the list elements.
np.where(gid.duplicated, ...)
: To ensure the taken values from l
is only applied to the first rows of each group (aka non-duplicates), and the rest gets filled with ""
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.