I have a large file with over 78K rows in Exel (365 version). I am trying to write a formula that will return a True
or False
value that is contingent on unique values in Column A
(21K unique values) AND if any of the values in Column B
are True
, then Column C
should return a True
value for that range of unique values in Column A
.
For example, I have the following data:
Column A Column B
1 True
1 False
1 False
2 False
2 False
3 False
3 True
I want Column C
to show the following:
Column A Column B Column C
1 True True
1 False True
1 False True
2 False False
2 False False
3 False True
3 True True
In other words, for every unique value in Column A
, and if any of the corresponding values in Column B
are True
, I want all values in Column C
to state True
.
After many different attempts at various formulas, I think I may found something close with the following formula, but it returns True
for every cell. I'm not sure what I'm missing.
=+IF(AND(UNIQUE($A$1:$A$7)),COUNTIF($B$1:$B$7,"TRUE")>0,1)
My data doesn't have any missing values.
I've searched this site for what I'm attempting, but the formula above was the closest I could come. This thread is close, but not quite what I'm looking for.
I know that I could do this manually with the following formula, but with over 21K unique values in Column A
, I don't want to do this manually if I don't have to.
=+COUNTIF($B$1:$B$3,"TRUE")>0
If this is easier to perform in Python, that code would be helpful. I am new to Python, and more comfortable with Excel, but understand Python may be easier and quicker.
This is how I would handle this in pandas.
print(df)
#note i've added in a non duplicated row for testing.
Column_A Column_B
0 1 True
1 1 False
2 1 False
3 2 False
4 2 False
5 3 False
6 3 True
7 4 True
First I would write two boolean expressions, the first - to see if any of the values are duplicates the second to see if Column_B contains any True values. if both equate to True I want to pass all the ID`s from column A into a list.
vals = df.loc[df.duplicated(subset=["Column_A"], keep=False)
& df["Column_B"].eq(True),
"Column_A"].tolist()
print(vals)
[1, 3]
now that we know what the values are we can write a simple boolean assignment.
df['Column_C'] = df['Column_A'].isin(vals)
print(df)
Column_A Column_B Column_C
0 1 True True
1 1 False True
2 1 False True
3 2 False False
4 2 False False
5 3 False True
6 3 True True
7 4 True False
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.