简体   繁体   中英

Repeat row of dataframe if condition met, and change value of one value

I have a dataframe:

import pandas as pd
df = pd.DataFrame(
{
    "Qty": [1,2,2,4,5,4,3],
    "Date": ['2020-12-16', '2020-12-17', '2020-12-18', '2020-12-19', '2020-12-20', '2020-12-21', '2020-12-22'],
    "Item": ['22-A', 'R-22-A', '33-CDE', 'R-33-CDE', '55-A', '22-AB', '55-AB'],
    "Price": [1.1, 2.2, 2.2, 4.4, 5.5, 4.4, 3.3]
})

I'm trying to duplicate each row where the Item suffix has 2 or more characters, and then change the value of the Item. For example, the row containing '22-AB' will become two rows. In the first row the Item will be '22-A', and in the 2nd it will be '22-B'. All this should be done only if the item number (without suffix) is in a 'clean' list.

Here is the pseudocode for what I'm trying to achieve:
Clean list of items = ['11', '22', '33']
For each row, check if substring of df["Item"] is in clean list.
if no:
skip row and leave it as it is
if yes:
check if len(suffix) >= 2
if no:
skip row and leave it as it is
if yes:
separate the item (11, 22, or 33) and the suffix
for char in suffix:
newitem = concat item + char
duplicate the row, replacing the old item with newitem
if number started with R-, prepend the R- again

The desired output:

df2 = pd.DataFrame(
{
    "Qty": [1,2,2,2,2,4,4,4,5,4,4,3,3],
    "Date": ['2020-12-16', '2020-12-17', '2020-12-18', '2020-12-18', '2020-12-18', '2020-12-19', '2020-12-19', '2020-12-19', '2020-12-20', '2020-12-21', '2020-12-21', '2020-12-22', '2020-12-22'],
    "Item": ['22-A', 'R-22-A', '33-C', '33-D', '33-E', 'R-33-C', 'R-33-D', 'R-33-E', '55-A', '22-A', '22-B', '55-A', '55-B'],
    "Price": [1.1, 2.2, 2.2, 2.2, 2.2, 4.4, 4.4, 4.4, 5.5, 4.4, 4.4, 3.3, 3.3]
})

What I have come up with so far:

mains = ['11', '22', '33']
for i in df["Item"]:
    iptrn = re.compile(r'\d{2}')
    optrn = re.compile('(?<=[0-9]-).*')
    item = bptrn.search(i).group(0)
    option = optrn.search(i).group(0)
    if item in mains:
        for o in option:
            combo = item + "-" + o
            print(combo)

I can't figure out the last step of actually duplicating the row. I've tried this: df = df.loc[df.index.repeat(1)].assign(Item=combo, num=len(option)-1).reset_index(drop=True), but it doesn't replace the Item correctly

You can use pandas operations to do the work here

It seems like the first step is to separate the two parts of the item code with pandas string methods (here, use extract with expand=True )

>>> item_code = df['Item'].str.extract('(?P<ic1>R?-?\d+)-+(?P<ic2>\w+)', expand=True)
>>> item_code
    ic1  ic2
0    22    A
1  R-22    A
2    33  CDE
3  R-33  CDE
4    55    A
5    22   AB
6    55   AB

You can add these columns directly to df - I just included that snippet above to show you the output from the extract operation.

>>> df = df.join(df['Item'].str.extract('(?P<ic1>R?-?\d+)-+(?P<ic2>\w+)', expand=True))
>>> df
   Qty        Date      Item  Price   ic1  ic2
0    1  2020-12-16      22-A    1.1    22    A
1    2  2020-12-17    R-22-A    2.2  R-22    A
2    2  2020-12-18    33-CDE    2.2    33  CDE
3    4  2020-12-19  R-33-CDE    4.4  R-33  CDE
4    5  2020-12-20      55-A    5.5    55    A
5    4  2020-12-21     22-AB    4.4    22   AB
6    3  2020-12-22     55-AB    3.3    55   AB

Next, I would build up a python data structure and convert it to a dataframe at the end rather than trying to insert rows or change existing rows.

data = []
for row in df.itertuples(index=False):
    for character in row.ic2:
        data.append({
          'Date': row.Date, 
          'Qty': row.Qty, 
          'Price': row.Price,
          'Item': f'{row.ic1}-{character}'
        })

newdf = pd.DataFrame(data)

The new dataframe looks like this

>>> newdf
          Date  Qty  Price    Item
0   2020-12-16    1    1.1    22-A
1   2020-12-17    2    2.2  R-22-A
2   2020-12-18    2    2.2    33-C
3   2020-12-18    2    2.2    33-D
4   2020-12-18    2    2.2    33-E
5   2020-12-19    4    4.4  R-33-C
6   2020-12-19    4    4.4  R-33-D
7   2020-12-19    4    4.4  R-33-E
8   2020-12-20    5    5.5    55-A
9   2020-12-21    4    4.4    22-A
10  2020-12-21    4    4.4    22-B
11  2020-12-22    3    3.3    55-A
12  2020-12-22    3    3.3    55-B

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM