Create n rows per id | Pandas

Question

I have a Dataframe df as follows:

id	lob	addr
a1	001	1234
a1	001	1233
a3	003	1221
a4	009	1234

I want to generate n (let's take 4) rows per id, with the other columns being null/na/nan values. So, the above table is to be transformed to:

id	lob	addr	addr2
a1	001	1234	0
a1	001	1233	0
a1	001	na	na
a1	na	na	na
a3	003	1221	0
a3	na	na	na
a3	na	na	na
a3	na	na	na
a4	009	1234	0
a4	na	na	na
a4	na	na	na
a4	na	na	na

How can I achieve this? I will have anywhere from 500-700 ids at the time of execution and the n will always be 70 (so each id should have 70 rows).

I wanted to create a loop that would create a row, do a group by id, see if it's less than 70 and repeat the process but it would end up doing a lot of unnecessary operations.

Answer 1

Here's a solution using Counter to count how many extra rows you need for each ID, and then just appending the new data:

from collections import Counter
id_count = Counter(df['id'])
# Create lists of each id repeated the number of times each is needed:
n = 4
id_values = [[i] * (n - id_count[i]) for i in id_count.keys()]
# Flatten to a single list:
id_values = [i for s in id_values for i in s]
# Create as new DataFrame and append to existing data:
new_data = pd.DataFrame({"id": id_values})
df = df.append(new_data).sort_values(by="id")

Answer 2

You can enumerate the rows within an id then, try stack/unstack or pivot :

(df.assign(enum=df.groupby('id').cumcount())
   .query('enum <4')
   .set_index(['enum','id'])
   .unstack('id')
   .reindex(range(4))
   .stack('id',dropna=False)
   .sort_index(level='id')
   .reset_index('id')
)

Answer 3

You can use the concat function in Pandas to optimize the running time, the code can looks something like this:

import pandas as pd
def replication(n, table):
        cols = [x in table.columns]
        empty_tab = table.copy()
        for x in cols:
            if x != 'ID':
                empty_tab[x] = np.nan
        for x in range(n):
                table = pd.concat([table, empty_tab.copy()], columns = cols) 
        return table

the second copy() is not really necessary in this case.

Answer 4

I would do something like this:

n = [1, 2, 3]
df[_count] = ",".join(n)
df[_count] = df[_count].str.split(",")
df = df.explode("_count", ignore_index=True).drop(columns="_count")

explode is a pretty nifty method that creates multiple rows of input for a given column https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html

Create n rows per id | Pandas

Question

4 answers

solution1
6 ACCPTED 2021-04-16 16:28:04

solution2
4 2021-04-12 18:07:37

solution3
0 2021-04-22 13:31:36

solution4
0 2021-04-23 10:17:44

Create n rows per id | Pandas

Question

4 answers

solution1 6 ACCPTED 2021-04-16 16:28:04

solution2 4 2021-04-12 18:07:37

solution3 0 2021-04-22 13:31:36

solution4 0 2021-04-23 10:17:44

solution1
6 ACCPTED 2021-04-16 16:28:04

solution2
4 2021-04-12 18:07:37

solution3
0 2021-04-22 13:31:36

solution4
0 2021-04-23 10:17:44