[英]How to split a column into alphabetic values and numeric values from a column in a Pandas dataframe?
我有一個數據幀:
Name Section
1 James P3
2 Sam 2.5C
3 Billy T35
4 Sarah A85
5 Felix 5I
如何將數值拆分為名為Section_Number的單獨列,並將字母值拆分為Section_Letter。 期望的結果
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5L 5 L
使用str.replace
與str.extract
由[AZ]+
為全大寫的字符串:
df['Section_Number'] = df['Section'].str.replace('([A-Z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
對於seelct也是小寫值:
df['Section_Number'] = df['Section'].str.replace('([A-Za-z]+)', '')
df['Section_Letter'] = df['Section'].str.extract('([A-Za-z]+)')
print (df)
Name Section Section_Number Section_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
毫無疑問它會變慢但是為了完整性而拋出一個替代方案,你可以使用str.extractall
來獲得與模式匹配的命名組並合並匹配並加入到你的DF ......
new = df.join(
df.Section.str.extractall(r'(?i)(?P<Section_Letter>[A-Z]+)|(?P<Section_Number>[\d.]+)')
.groupby(level=0).first()
)
結果:
Name Section Section_Letter Section_Number
1 James P3 P 3
2 Sam 2.5C C 2.5
3 Billy T35 T 35
4 Sarah A85 A 85
5 Felix 5I I 5
如果在您的示例中,每個名稱中都有一個字母,則可以對其進行排序,然后進行切片:
def get_vals(x):
return ''.join(sorted(x, key=str.isalpha))
# apply ordering
vals = df['Section'].apply(get_vals)
# split numbers from letter
df['num'] = vals.str[:-1].astype(float)
df['letter'] = vals.str[-1]
print(df)
Name Section num letter
1 James P3 3.0 P
2 Sam 2.5C 2.5 C
3 Billy T35 35.0 T
4 Sarah A85 85.0 A
5 Felix 5I 5.0 I
我們可以使用itertools.groupby
對連續的alpha和非alpha進行分組
from itertools import groupby
[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
[['3', 'P'], ['2.5', 'C'], ['35', 'T'], ['85', 'A'], ['5', 'I']]
我們可以將其操作為新列
from itertools import groupby
N, L = zip(
*[sorted([''.join(x) for _, x in groupby(s, key=str.isalpha)]) for s in df.Section]
)
df.assign(Selection_Number=N, Selection_Letter=L)
Name Section Selection_Number Selection_Letter
1 James P3 3 P
2 Sam 2.5C 2.5 C
3 Billy T35 35 T
4 Sarah A85 85 A
5 Felix 5I 5 I
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.