have a data set which looks like this:
Date Item A.unit B.Unit C.Unit D.Unit
10/11 A,D 5 0 0 12
11/11 A,B,C 10 10 5 0
12/11 A 20 0 0 0
i want the output column so that whenever there are more than one element in the list, it will calculate the difference of the unit, and when single element is present it will display zero. so output will be:
Date Item A.unit B.Unit C.Unit D.Unit output
10/11 A,D 5 0 0 12 5-12=-7
11/11 A,B,C 10 10 5 0 10-10-5=-5
12/11 A 20 0 0 0 0--since only one element is there
can anyone please tell me how to get the output column.
Solution working with no check Item
column - it use first non 0 value per Unit
columns and subtract by sum of values, also if only 1 value it set 0
:
#all columns without first and second
df1 = df.iloc[:, 2:].mask(lambda x: x==0)
#alternative
#all columns with Unit in column names
#df1 = df.filter(like='Unit').mask(lambda x: x==0)
first = df1.bfill(axis=1).iloc[:, 0]
df['output'] = np.where(df1.count(axis=1) == 1, 0, first - df1.sum(axis=1) + first)
print (df)
Date Item A.Unit B.Unit C.Unit D.Unit output
0 10/11 A,D 5 0 0 12 -7.0
1 11/11 A,B,C 10 10 5 0 -5.0
2 12/11 A 20 0 0 0 0.0
Solution with match by Item
column - explode Item
to rows, multiple by -1
and 0
if only one value and last aggregate sum
, first
and join
:
df = df.assign(Item = df['Item'].str.split(',')).explode('Item').reset_index(drop=True)
df['new'] = df.lookup(df.index, df['Item'] + '.Unit')
df.loc[df.duplicated(subset=['Date']), 'new'] *= -1
df.loc[~df.duplicated(subset=['Date'], keep=False), 'new'] = 0
d1 = dict.fromkeys(df.columns.difference(['Date','Item','new']), 'first')
fin = {**{'Item':','.join}, **d1, **{'new':'sum'}}
df = df.groupby('Date', as_index=False).agg(fin)
print (df)
Date Item A.Unit B.Unit C.Unit D.Unit new
0 10/11 A,D 5 0 0 12 -7
1 11/11 A,B,C 10 10 5 0 -5
2 12/11 A 20 0 0 0 0
Here is one solution. The first step is to create a function that does exactly what you want on one specific row:
from functool import reduce
def sum_function(x):
if len(x[x != 0]) == 1:
return 0
else:
return reduce(lambda a,b: a-b, x)
If there is only one element in the row that is not 0, then return 0. If there are more elements then subtract them all. And here is how you can apply that function to every row:
columns = ['A.unit', 'B.unit', 'C.unit', 'D.unit']
df.apply(lambda x: sum_function(x[columns]), axis=1)
The result is:
0 -7
1 -5
2 0
And you could add that as a new column:
df['output'] = df.apply(lambda x: sum_function(x[columns]), axis=1)
Try:
def calc(row):
out = row[np.argmax(np.array(row.tolist()) > 0)]
for c in row.values[np.argmax(np.array(row.tolist()) > 0)+1:]:
out -= c
if out == row.sum():
return 0
else:
return out
df['output'] = df.drop(['Date','Item'], axis=1).apply(calc, axis=1)
Output:
Date Item A.unit B.Unit C.Unit D.Unit output
0 10/11 A,D 5 0 0 12 -7
1 11/11 A,B,C 10 10 5 0 -5
2 12/11 A 20 0 0 0 0
Another solution using lambda, regex
is
unit_columns = list(df.columns[2:])
regex = re.compile(re.escape('.Unit'), re.IGNORECASE)
unit_columns_replaced = [regex.sub('', a) for a in unit_columns]
def output(row):
ItemN = row['Item'].split(",")
if len(ItemN) < 2:
return 0
idxs = np.where(np.in1d(unit_columns_replaced, ItemN))[0]
c_names = [unit_columns[idx] for idx in idxs]
f_columns = row.filter(items=c_names)
return 2 * f_columns[0] - f_columns.sum()
df['output'] = df.apply(lambda row: output(row), axis=1)
df
which gives output as
Date Item A.unit B.Unit C.Unit D.Unit output
0 10/11 A,D 5 0 0 12 -7
1 11/11 A,B,C 10 10 5 0 -5
2 12/11 A 20 0 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.