简体   繁体   中英

Removing ranges of characters in pandas dataframe index

I have a list of text items in a dataframe column, some of which containing integers at the end, and some containing info between brackets "(extra info)". The rest of the items are just plane text. I want to remove all the integers from those which have them, and all the paranthesis with their info within, whilst still keeping the value after which it is located.

             Cost   Item Purchased  Name
Store1       22.5   Sponge          Chris
Shop         2.5    Kitty Litter    Kevyn
House (aax)  2  Spoon               Filip

I would like the output to be

           Cost Item Purchased  Name
Store      22.5 Sponge          Chris
Shop       2.5  Kitty Litter    Kevyn
House      2    Spoon           Filip

Set up the dataframe. It would be useful in future if you put this in the question.

df = pd.DataFrame(
    {
        "cost": [22.5, 2.5, 2],
        "item purchased": ["Sponge", "kitty litter", "spoon"],
        "name": ["Chris", "Kevyn", "Filip"],
    },
    index=["Store1", "Shop", "House (aax)"],
)


# reset the index to a column.
df=df.reset_index()

# split the index and keep the first item in the lists.
df['index'] = df['index'].str.split("(").map(lambda x: x[0])

# reset the index
df = df.set_index('index')

print(df)

        cost    item purchased  name
index           
Store1  22.5    Sponge          Chris
Shop    2.5     kitty litter    Kevyn
House   2.0     spoon           Filip

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM