This problem may be rather specific, but I bet many may encounter this as well. So I have a DataFrame in a form like:
asd = pd.DataFrame({'Col1': ['a', 'b', 'b','a','a'], 'Col2': [0,0,0,1,1]})
The resulting table looks like this:
I -- Col1 -- Col2
1 -- a -- 0
2 -- b -- 0
3 -- b -- 0
4 -- a -- 1
5 -- a -- 1
What I am trying to do is to:
if at least one "a" value in Col1
has a corresponding value of 1
in Col2
, then in Col3
we put 1
for all values of "a"
otherwise (if not even one "a" has a value of 1), then we put "0" for all values of "a"
And then repeat for all other values in Col1
.
The result of the operation should look like this:
I -- Col1 -- Col2 -- Col3
1 -- a -- 0 -- 1 because "a" has value of 1 in 4th and 5th lines
2 -- b -- 0 -- 0 because all "b" have values of 0
3 -- b -- 0 -- 0
4 -- a -- 1 -- 1
5 -- a -- 1 -- 1
Currently I am doing this:
asd['Col3'] = 0
col1_uniques = asd.drop_duplicates(subset='Col1')['Col1']
small_dataframes = []
for i in col1_uniques:
small_df = asd.loc[asd.Col1 == i]
if small_df.Col2.max() == 1:
small_df['Col3'] = 1
small_dataframes.append(small_df)
I then reassemble the dataframe back.
However, that takes too much time (I have about 80000 unique values in Col1). In fact, while I was writing this, it hasn't finished even a quarter of that job.
Is there a better way to do it?
My understanding is that you need to repeat the process for all unique values in Col1, you will need groupby,
asd['Col3'] = asd.groupby('Col1').Col2.transform(lambda x: x.eq(1).any().astype(int))
Col1 Col2 Col3
0 a 0 1
1 b 0 0
2 b 0 0
3 a 1 1
4 a 1 1
Option 2: Similar solution as above but using map
d = asd.groupby('Col1').Col2.apply(lambda x: x.eq(1).any().astype(int)).to_dict()
asd['Col3'] = asd['Col1'].map(d)
You can do this with a groupby and an if statement. First group all items by Col1:
lists = asd.groupby("Col1").agg(lambda x: tuple(x))
This gives you:
Col2
Col1
a (0, 1, 1)
b (0, 0)
You can then iterate through the unique index values in lists, masking the original DataFrame and setting Col3 to 1 if a 1 is found in lists["Col2"].
asd["Col3"] = 0
for i in lists.index:
if 1 in lists.loc[i, "Col2"]:
asd.loc[asd["Col1"]==i, "Col3"] = 1
This results in:
Col1 Col2 Col3
0 a 0 1
1 b 0 0
2 b 0 0
3 a 1 1
4 a 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.