I want to make and populate a new column with a value based on whether or not a column has a value in it. I have three columns to compare and there are orders to which I would prefer the values to be populated.
Let's say I have 3 columns (ABC) and I want to populate the new column (Y) with the value in AB or C but I want to rank them. So if Column A has a value I want that to populate column Y with precedence over columns B and C. If B has a value it takes precedence over C and column C takes precedence over nothing.
What I have:
A B C Y
1 NA NA
NA 2 NA
NA 3 NA
NA NA 4
5 NA NA
6 6 NA
7 NA NA
NA NA 8
9 NA 9
10 10 10
What I want:
A B C Y
1 NA NA 1
NA 2 NA 2
NA 3 NA 3
NA NA 4 4
5 NA NA 5
6 6 NA 6
7 NA NA 7
NA NA 8 8
9 NA 9 9
10 10 10 10
Use np.where()
for a vectorized approach.
df['Y'] = np.where(df['A'] != np.nan, df['A'], df['B'])
df['Y'] = np.where(df['B'] == np.nan, df['C'], df['Y'])
Since you don't have a df that can be reused in your question, I just wrote the method line you need.
Next time you ask question, kindly include a snippet of code that can be used to test the possible answer/s. Welcome to the community:D
If NA in your DataFrame is a string:
The above code won't work, use the actual string value to parse through the dataframe.
df['Y'] = np.where(df['A'] != "NA", df['A'], df['B'])
df['Y'] = np.where(df['B'] == "NA", df['C'], df['Y'])
Last note, another possible input is when all three columns has NA values.
That is not specified in your question, but if you want to capture that instance, just add a new line to check values at your C
column.
Then in the false return value, place a value you want to use if C
value is also NA.
Great question. I think there are a lot of ways to approach this, One that immediately comes to mind for me is to use a loop that converts each row into a series. and then populates the Y column with the first entry in that series that is not a 'NA' value: The a general code would look like:
for row in DF:
temp_series = pd.Series(row)
for entry in temp_series:
if entry==NA:
continue
else:
df.iloc[row,3] = entry #3 = Y col index
Again this isn't an exact copy and paste solution, but this methodology should give you what you want. Good luck and happy coding!
Edit: And from one new user to another, welcome to the community!
You can use pandas to construct the data structure, and then a careful use of the apply()
function can help you get the transformation that you want.
import pandas as pd
import math
data = (
[1, None, None,],
[None, 2 , None,],
[None, 3 , None,],
[None, None, 4, ],
[5 , None, None,],
[6 , 6 , None,],
[7 , None, None,],
[None, None, 8, ],
[9 , None, 9, ],
[10, 10, 10, ],
)
df = pd.DataFrame(columns=('A', 'B', 'C'))
# Load in data
for row in data:
df = df.append(pd.Series(row, index=df.columns), ignore_index=True)
print(df)
def calc_y(row):
for item in row:
if not math.isnan(item):
return item
df['Y'] = df.apply(calc_y, axis=1)
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.