简体   繁体   中英

How to change column names based on the first three characters of the column name

I would like to change the column names based on the first three characters of the column name using a dictionary.

This is the code I have currently:

new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
                "pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}

for x,y in new_names.items():
    if df.columns.str.startswith(x):
       df.columns = df.columns.str.replace(x,y)

I get the following error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Use:

df = pd.DataFrame({'aud1':list('abcdef'),
                   'spe2':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'F':list('aaabbb')})

print (df)
  aud1   spe2  C  F
0    a      4  7  a
1    b      5  8  a
2    c      4  9  a
3    d      5  4  b
4    e      5  2  b
5    f      4  3  b

new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
                "pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}

First filter first 3 values of dictionary:

new_names = {k[:3] :v for k, v in new_names.items()}

print (new_names)
{'aud': 'alc_aud', 'whe': 'clu_whe', 'per': 'pre_per', 
     'pol': 'cou_pol', 'spe': 'coc_spec', 'dar': 'daw_dark'}

And then select first 3 letter by indexing str[:3] and then replace by dict :

df.columns = df.columns.to_series().str[:3].replace(new_names)
print (df)
  alc_aud  coc_spec  C  F
0       a         4  7  a
1       b         5  8  a
2       c         4  9  a
3       d         5  4  b
4       e         5  2  b
5       f         4  3  b

Another solution with get with list comprehension , if value is not matched return original value:

df.columns = [new_names.get(x[:3], x) for x in df.columns]
print (df)
  alc_aud  coc_spec  C  F
0       a         4  7  a
1       b         5  8  a
2       c         4  9  a
3       d         5  4  b
4       e         5  2  b
5       f         4  3  b

EDIT: Soluton working with strings with any length:

df = pd.DataFrame({'aud1':list('abcdef'),
                   'specd2':[4,5,4,5,5,4],
                   'podfds':[7,8,9,4,2,3],
                   'aaper':list('aaabbb')})

print (df)
  aud1  specd2  podfds aaper
0    a       4       7     a
1    b       5       8     a
2    c       4       9     a
3    d       5       4     b
4    e       5       2     b
5    f       4       3     b

new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
                "po":"cou_pol","spec":"coc_spec","dark":"daw_dark"}

First extract all values starting by keys of dict and then map , last fill non matched values by fillna :

pat = '|'.join([r'^{}'.format(x) for x in new_names])
s  = df.columns.to_series()
df.columns = s.str.extract('('+ pat + ')', expand=False).map(new_names).fillna(s)
print (df)
  alc_aud  coc_spec  cou_pol aaper
0       a         4        7     a
1       b         5        8     a
2       c         4        9     a
3       d         5        4     b
4       e         5        2     b
5       f         4        3     b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM