[英]Regex problems with special characters (e.g. #, /) in a chord dictionary (Python)
我正在編寫和弦字典,為此我需要將不同類型的和弦分組為更小的組。
但是,我在處理一些包含#(例如 C#、C#m)和 D7/F# 和 A/B 等變體的變體時遇到了問題,我想將它們插入到其他變體中。
我相信這是一些正則表達式參數,我承認我不太熟悉。
這是開發的代碼:
triadeMaior = pd.DataFrame({'triadeMaior': ['C','C#','Db','D','D#','Eb','E','F','F#','Gb','G','G#','Ab','A','A#','Bb','B']
})
triadeMenor = pd.DataFrame({'triadeMenor': ['Cm','C#m','Dbm','Dm','D#m','Ebm','Em','Fm','F#m','Gbm','Gm','G#m','Abm','Am','A#m','Bbm','Bm']
})
triadeDiminuta = pd.DataFrame({'triadeDiminuta':['Cdim','C#dim','Dbdim', 'Ddim', 'D#dim', 'Ebdim', 'Edim', 'Fdim', 'F#dim', 'Gbdim','Gdim',
'G#dim', 'Abdim', 'Adim', 'A#dim', 'Bbdim', 'Bdim']
})
triadeAumentada = pd.DataFrame({'triadeAumentada':['Caug','C#aug','Dbaug','Daug','D#aug','Ebaug','Eaug','Faug','F#aug','Gbaug','Gaug','G#aug','Abaug','Aaug','A#aug','Bbaug','Baug' ]
})
setima = pd.DataFrame({'setima':['C7','C#7','Db7','D7','D#7','Eb7','E7','F7','F#7','Gb7','G7','G#7','Ab7','A7','A#7','Bb7','B7']
})
setimaMenor = pd.DataFrame({'setimaMenor':['Cm7','C#m7','Dbm7','Dm7','D#m7','Ebm7','Em7','Fm7','F#m7','Gbm7','Gm7','G#m7','Abm7','Am7','A#m7','Bbm7','Bm7']
})
setimaMaior = pd.DataFrame({'setimaMaior':['Cmaj7', 'C#maj7', 'Dbmaj7', 'Dmaj7', 'D#maj7', 'Ebmaj7', 'Emaj7', 'Fmaj7', 'F#maj7','Gbmaj7','Gmaj7', 'G#maj7','Abmaj7','Amaj7','A#maj7','Bbmaj7','Bmaj7']
})
setimaMenorQuinta = ({'setimaMenorQuinta':['Cm7b5','C#m7b5', 'Dbm7b5', 'Dm7b5', 'D#m7b5', 'Ebm7b5','Em7b5', 'Fm7b5', 'F#m7b5', 'Gbm7b5', 'Gm7b5', 'G#m7b5', 'Abm7b5', 'Am7b5', 'A#m7b5', 'Bbm7b5', 'Bm7b5']
})
sexta= pd.DataFrame({'sexta':['C6','C#6','Db6','D6','D#6','Eb6','E6','F6','F#6','Gb6','G6','G#6','Ab6','A6','A#6','Bb6','B6']
})
sextaMenor = pd.DataFrame({'sextaMenor': ['Cm6','C#m6','Dbm6','Dm6','D#m6','Ebm6','Em6','Fm6','F#m6','Gbm6','Gm6','G#m6','Abm6','Am6','A#m6'
'Bbm6','Bm6']
})
triadeMaior_pat = fr"\b({'|'.join(triadeMaior['triadeMaior'])})\b"
triadeMenor_pat = fr"\b({'|'.join(triadeMenor['triadeMenor'])})\b"
triadeDiminuta_pat = fr"\b({'|'.join(triadeDiminuta['triadeDiminuta'])})\b"
triadeAumentada_pat = fr"\b({'|'.join(triadeAumentada['triadeAumentada'])})\b"
setima_pat = fr"\b({'|'.join(setima['setima'])})\b"
setimaMenor_pat = fr"\b({'|'.join(setimaMenor['setimaMenor'])})\b"
setimaMaior_pat = fr"\b({'|'.join(setimaMaior['setimaMaior'])})\b"
setimaMenorQuinta_pat = fr"\b({'|'.join(setimaMenorQuinta['setimaMenorQuinta'])})\b"
sexta_pat = fr"\b({'|'.join(sexta['sexta'])})\b"
sextaMenor_pat = fr"\b({'|'.join(sextaMenor['sextaMenor'])})\b"
df['chordType'] = df['chords'].replace({triadeMaior_pat: 'triadeMaj',
triadeMenor_pat: 'triadeMen',
triadeDiminuta_pat: 'triadeDim',
triadeAumentada_pat: 'triadeAug',
setima_pat: 'setima',
setimaMenor_pat: 'setimaMen',
setimaMaior_pat: 'setimaMaj',
setimaMenorQuinta_pat : 'setimaMenQui',
sexta_pat:'sexta',
sextaMenor_pat: 'sextaMen',
r'\b(?!triadeMaj|triadeMen|triadeDim|triadeAug|setima|setimaMen|setimaMen|setimaMaj|sexta|sextaMen\b)\w+': 'outros'},
regex=True)
以下是一些結果的示例:
和弦 | 和弦類型 |
---|---|
C# , E7, Abm, Amaj7, E, Abm, C#m , E | triadeMaj# , setima, triadeMen, setimaMaj, triadeMaj, triadeMen, triadeMaj#outros , triadeMaj |
E、A7、G6、 D/F# 、F6、E、Em、 D7/F# 、Fmaj7、E、A7、G6、 D7/F# 、F6、Em、D、Dm7、E | triadeMaj,setima,sexta, triadeMaj/triadeMaj# ,sexta,triadeMaj,triadeMen, setima/triadeMaj# ,setimaMaj,triadeMaj,setima,sexta, setima/triadeMaj# ,sexta,triadeMen,triadeMaj,setimaMen,triadeMaj |
如您所見,對於帶有# 和/ 的和弦,當前代碼將其理解為兩部分而不是一個部分。
有誰知道如何解決? 另外,正如我所提到的,我沒有很多正則表達式技能,所以我不知道是否可以縮短代碼並使代碼更加健壯和干凈。
實際上,如果沒有正則表達式,這可能會更干凈。
此示例僅使用您數據的一小部分,但您可以使用所有映射填寫chord_types
字典。
import pandas as pd
chord_types = {'C': 'triadeMaj', 'C#': 'triadeMaj', 'C7': 'setima'} # Add as required
df = pd.DataFrame(['C, C7', 'C, C#'], columns=('chords',)) # Toy example
map_fn = lambda cs: ', '.join((chord_types.get(c, 'outros') for c in cs))
df['chordType'] = df['chords'].str.replace(' ', '').str.split(',').apply(map_fn)
print(df)
給予:
chords chordType
0 C, C7 triadeMaj, setima
1 C, C# triadeMaj, triadeMaj
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.