简体   繁体   中英

Calculate the difference of list elements if more than one element in the list is not zero

have a data set which looks like this:

Date       Item        A.unit       B.Unit    C.Unit      D.Unit   
10/11       A,D          5            0         0          12
11/11       A,B,C       10            10        5          0
12/11       A           20             0        0           0  

i want the output column so that whenever there are more than one element in the list, it will calculate the difference of the unit, and when single element is present it will display zero. so output will be:

Date       Item        A.unit       B.Unit    C.Unit      D.Unit          output
    10/11       A,D          5            0         0          12           5-12=-7 
    11/11       A,B,C       10            10        5          0            10-10-5=-5
    12/11       A           20             0        0           0            0--since only one element is there

can anyone please tell me how to get the output column.

Solution working with no check Item column - it use first non 0 value per Unit columns and subtract by sum of values, also if only 1 value it set 0 :

#all columns without first and second
df1 = df.iloc[:, 2:].mask(lambda x: x==0)
#alternative
#all columns with Unit in column names
#df1 = df.filter(like='Unit').mask(lambda x: x==0)
first = df1.bfill(axis=1).iloc[:, 0]
df['output'] = np.where(df1.count(axis=1) == 1, 0, first - df1.sum(axis=1) + first)
print (df)
    Date   Item  A.Unit  B.Unit  C.Unit  D.Unit  output
0  10/11    A,D       5       0       0      12    -7.0
1  11/11  A,B,C      10      10       5       0    -5.0
2  12/11      A      20       0       0       0     0.0

Solution with match by Item column - explode Item to rows, multiple by -1 and 0 if only one value and last aggregate sum , first and join :

df = df.assign(Item = df['Item'].str.split(',')).explode('Item').reset_index(drop=True)
df['new'] = df.lookup(df.index, df['Item'] + '.Unit')

df.loc[df.duplicated(subset=['Date']), 'new'] *=  -1
df.loc[~df.duplicated(subset=['Date'], keep=False), 'new'] =  0


d1 = dict.fromkeys(df.columns.difference(['Date','Item','new']), 'first')
fin = {**{'Item':','.join}, **d1, **{'new':'sum'}}
df = df.groupby('Date', as_index=False).agg(fin)

print (df)
    Date   Item  A.Unit  B.Unit  C.Unit  D.Unit  new
0  10/11    A,D       5       0       0      12   -7
1  11/11  A,B,C      10      10       5       0   -5
2  12/11      A      20       0       0       0    0

Here is one solution. The first step is to create a function that does exactly what you want on one specific row:

from functool import reduce
def sum_function(x):
  if len(x[x != 0]) == 1:
    return 0
  else:
    return reduce(lambda a,b: a-b, x)

If there is only one element in the row that is not 0, then return 0. If there are more elements then subtract them all. And here is how you can apply that function to every row:

columns = ['A.unit', 'B.unit', 'C.unit', 'D.unit']
df.apply(lambda x: sum_function(x[columns]), axis=1)

The result is:

0   -7
1   -5
2    0

And you could add that as a new column:

df['output'] = df.apply(lambda x: sum_function(x[columns]), axis=1)

Try:

def calc(row):
    out = row[np.argmax(np.array(row.tolist()) > 0)]
    for c in row.values[np.argmax(np.array(row.tolist()) > 0)+1:]:
        out -= c
    if out == row.sum():
        return 0
    else:
        return out

df['output'] = df.drop(['Date','Item'], axis=1).apply(calc, axis=1)

Output:

    Date   Item  A.unit  B.Unit  C.Unit  D.Unit  output
0  10/11    A,D       5       0       0      12      -7
1  11/11  A,B,C      10      10       5       0      -5
2  12/11      A      20       0       0       0       0

Another solution using lambda, regex is

unit_columns = list(df.columns[2:])
regex = re.compile(re.escape('.Unit'), re.IGNORECASE)
unit_columns_replaced = [regex.sub('', a) for a in unit_columns]

def output(row):
    ItemN = row['Item'].split(",")
    if len(ItemN) < 2:
        return 0
    idxs = np.where(np.in1d(unit_columns_replaced, ItemN))[0]
    c_names = [unit_columns[idx] for idx in idxs]
    f_columns = row.filter(items=c_names)
    return 2 * f_columns[0] - f_columns.sum()


df['output'] = df.apply(lambda row: output(row), axis=1)
df

which gives output as

    Date    Item    A.unit  B.Unit  C.Unit  D.Unit  output
0   10/11   A,D     5   0   0   12  -7
1   11/11   A,B,C   10  10  5   0   -5
2   12/11   A   20  0   0   0   0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM