简体   繁体   中英

Want to replace a column in Python with certain values

Code used:

def fn(x):
    for i in x:
        x=x.replace('Wood','Wooden')
        return x


test['Coming:'] = test['Column:'].apply(fn)

Sample output:

Column:       Coming:      Needed:
 Wood         Wooden       Wooden                                        
Wooden       Woodenen      Wooden

I want Wooden and similare categories to be intact like Woodings , woods etc.. Also Column: could be string eg "Wood is there on the ground" and needed output is "Wooden is there on the ground"

You can use pandas replace function . Define in a dictionary, what you want to replace and substitute the words in your new column:

import pandas as pd
#test data
df = pd.DataFrame(["Wood", "Wooden", "Woody Woodpecker", "wood", "wool", "wool suit"], columns = ["old"])
#dictionary for substitutions
subst_dict = {"Wood": "Wooden", "wool": "soft"}
df["new"] = df["old"].replace(subst_dict)
#output
                old               new
0              Wood            Wooden
1            Wooden            Wooden
2  Woody Woodpecker  Woody Woodpecker
3              wood              wood
4              wool              soft
5         wool suit         wool suit

Though for more complex substitutions utilising regex, it might be a good idea to write a function and use your apply() approach.

Update after changing the requirements:
If you want to match only whole words in phrases, you can use regex:

import pandas as pd
#test data
df = pd.DataFrame(["Wood", "Wooden", "Woody Woodpecker", "wood", "wool", "wool suit", "Wood is delicious", "A beautiful wool suit"], columns = ["old"])
#dictionary for substitutions
subst_dict = {"Wood": "Wooden", "wool": "soft"}
#create dictionary of regex expressions
temp_dict = {r'(\b){}(\b)'.format(k) : v for k, v in subst_dict.items()}
#and substitute
df["new"] = df["old"].replace(temp_dict, regex = True)
#output
                     old                    new
0                   Wood                 Wooden
1                 Wooden                 Wooden
2       Woody Woodpecker       Woody Woodpecker
3                   wood                   wood
4                   wool                   soft
5              wool suit              soft suit
6      Wood is delicious    Wooden is delicious
7  A beautiful wool suit  A beautiful soft suit

Here is one way to replace all substrings in a dictionary. Just note that the order may become important if any of the values and keys of the dictionary collide:

import pandas as pd

s = pd.Series(['Wood', 'Wooden', 'Woody Woodpecker', 'wood', 'wood', 'wool suit'])

d = {'Wood': 'Wooden', 'wool': 'soft'}

for k, v in d.items():
    s = s.str.replace(k, v)

# 0                  Wooden
# 1                Woodenen
# 2    Woodeny Woodenpecker
# 3                    wood
# 4                    wood
# 5               soft suit
# dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM