简体   繁体   中英

aggregate by value and count, distinct array

Let's say i have this list of tuples

[
('r', 'p', ['A', 'B']),
('r', 'f', ['A']),
('r', 'e', ['A']),
('r', 'p', ['A']),
('r', 'f', ['B']),
('r', 'p', ['B']),
('r', 'e', ['B']),
('r', 'c', ['A'])
]

Need to return a list of tuples that aggregated (group by) by the second value in the tuple, count the number of the aggregation. for the third value, which is an array, need to distinct and aggregate it.

So for the example above, the result will be:

[
('r', 'p', ['A', 'B'], 4),
('r', 'f', ['A', 'B'], 2),
('r', 'e', ['A', 'B'], 2),
('r', 'c', ['A'], 1)
]

In the result, the first value is a const, the second is unique (it was grouped by) the third is distinct grouped array, and the forth is the count of values of the array if we grouped them

You could do this in pandas

import pandas as pd

df = pd.DataFrame([
('r', 'p', ['A', 'B']),
('r', 'f', ['A']),
('r', 'e', ['A']),
('r', 'p', ['A']),
('r', 'f', ['B']),
('r', 'p', ['B']),
('r', 'e', ['B']),
('r', 'c', ['A'])
], columns=['first','second','arr'])

pd.merge(df.explode('arr').groupby(['first','second']).agg(set).reset_index(),
         df[['first','second']].value_counts().reset_index(),
         on=['first','second']).values.tolist()

Output

[
    ['r', 'c', {'A'}, 1],
    ['r', 'e', {'B', 'A'}, 2],
    ['r', 'f', {'B', 'A'}, 2],
    ['r', 'p', {'B', 'A'}, 3]
]

To address your edit you could do this:

(
  df.explode('arr')
    .value_counts()
    .reset_index()
    .groupby(['first','second'])
    .agg({'arr':set, 0:sum})
    .reset_index()
    .values
    .tolist()
)

Output

[
   ['r', 'c', {'A'}, 1],
   ['r', 'e', {'B', 'A'}, 2],
   ['r', 'f', {'B', 'A'}, 2],
   ['r', 'p', {'B', 'A'}, 4]
]

Here's my attempt using itertools .

from itertools import groupby

data = [
('r', 'p', ['A', 'B']),
('r', 'f', ['A']),
('r', 'e', ['A']),
('r', 'p', ['A']),
('r', 'f', ['B']),
('r', 'p', ['B']),
('r', 'e', ['B']),
('r', 'c', ['A'])
]

# groupby needs sorted data
data.sort(key=lambda x: (x[0], x[1]))
result = []
for key,group in groupby(data, key=lambda x: (x[0], x[1])):
    # Make the AB list. Ex: s = ['A', 'B', 'A', 'B']
    s = [item for x in group for item in x[2]]
    # Put it all together. Ex: ('r', 'p', ['A', 'B'], 4)
    result.append(tuple(list(key) + [list(set(s))] + [len(s)]))

I hope I've understood your question well:

data = [
    ("r", "p", ["A", "B"]),
    ("r", "f", ["A"]),
    ("r", "e", ["A"]),
    ("r", "p", ["A"]),
    ("r", "f", ["B"]),
    ("r", "p", ["B"]),
    ("r", "e", ["B"]),
    ("r", "c", ["A"]),
]

out = {}
for a, b, c in data:
    out.setdefault((a, b), []).append(c)

out = [
    (a, b, list(set(v for l in c for v in l)), sum(map(len, c)))
    for (a, b), c in out.items()
]

print(out)

Prints:

[
    ("r", "p", ["B", "A"], 4),
    ("r", "f", ["B", "A"], 2),
    ("r", "e", ["B", "A"], 2),
    ("r", "c", ["A"], 1),
]

convtools supports custom aggregations (I must confess, I'm the author), so here's the code:

from convtools import conversion as c

data = [
    ("r", "p", ["A", "B"]),
    ("r", "f", ["A"]),
    ("r", "e", ["A"]),
    ("r", "p", ["A"]),
    ("r", "f", ["B"]),
    ("r", "p", ["B"]),
    ("r", "e", ["B"]),
    ("r", "c", ["A"]),
]

converter = (
    c.group_by(c.item(1))
    .aggregate(
        (
            c.ReduceFuncs.First(c.item(0)),
            c.item(1),
            c.reduce(
                lambda x, y: x.union(y),
                c.item(2).as_type(set),
                initial=set,
                default=set,
            ).as_type(list),
            c.ReduceFuncs.Sum(c.item(2).len()),
        )
    )
    .gen_converter()  # generates ad-hoc python function; reuse if needed
)

The output is:

In [47]: converter(data)
Out[47]:
[('r', 'p', ['B', 'A'], 4),
 ('r', 'f', ['B', 'A'], 2),
 ('r', 'e', ['B', 'A'], 2),
 ('r', 'c', ['A'], 1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM