I would like to change the column names based on the first three characters of the column name using a dictionary.
This is the code I have currently:
new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}
for x,y in new_names.items():
if df.columns.str.startswith(x):
df.columns = df.columns.str.replace(x,y)
I get the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Use:
df = pd.DataFrame({'aud1':list('abcdef'),
'spe2':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'F':list('aaabbb')})
print (df)
aud1 spe2 C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"}
First filter first 3 values of dictionary:
new_names = {k[:3] :v for k, v in new_names.items()}
print (new_names)
{'aud': 'alc_aud', 'whe': 'clu_whe', 'per': 'pre_per',
'pol': 'cou_pol', 'spe': 'coc_spec', 'dar': 'daw_dark'}
And then select first 3 letter by indexing str[:3]
and then replace
by dict
:
df.columns = df.columns.to_series().str[:3].replace(new_names)
print (df)
alc_aud coc_spec C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
Another solution with get
with list comprehension
, if value is not matched return original value:
df.columns = [new_names.get(x[:3], x) for x in df.columns]
print (df)
alc_aud coc_spec C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
EDIT: Soluton working with strings with any length:
df = pd.DataFrame({'aud1':list('abcdef'),
'specd2':[4,5,4,5,5,4],
'podfds':[7,8,9,4,2,3],
'aaper':list('aaabbb')})
print (df)
aud1 specd2 podfds aaper
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
new_names = {"aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"po":"cou_pol","spec":"coc_spec","dark":"daw_dark"}
First extract
all values starting by keys of dict and then map
, last fill non matched values by fillna
:
pat = '|'.join([r'^{}'.format(x) for x in new_names])
s = df.columns.to_series()
df.columns = s.str.extract('('+ pat + ')', expand=False).map(new_names).fillna(s)
print (df)
alc_aud coc_spec cou_pol aaper
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.