[英]Split and parse only specific characters from string using python
我正在嘗試從列中拆分和解析字符並將解析的數據提交到不同的列中。
首先需要從字符串中刪除不需要的數據,然后需要將其拆分並通過在年份之前添加前綴 19 將其保存到不同的列中。
輸入:
Col1
U_a65839_Jan87Apr88
U_b98652_Feb88Apr88
V_C56478_mar89Apr89
Q_d15634_Apr90Apr91
S_e15336_may91Apr93
NaN
Output
Col2 Col3
Jan1987 Apr1988
Feb1987 Apr1988
mar1987 Apr1988
Apr1987 Apr1988
may1987 Apr1988
NaN NaN
代碼片段,到目前為止我一直在嘗試什么:
df = pd.read_excel(open(r'C:\Users\Desktop\data1.xlsx', 'rb'), sheet_name='sheet1')
df['Col1'] = df['Col1'].partition("_")[2]
請建議如何執行它。
假設你有
col_1 = [
'U_a65839_Jan87Apr88',
'U_b98652_Feb88Apr88',
'V_C56478_mar89Apr89',
'Q_d15634_Apr90Apr91',
'S_e15336_May91Apr93',
np.nan
]
df = pd.DataFrame({
'Col1': col_1
})
你可以
# remove unwanted data
df['Col1'] = df.Col1.str.replace(
'.*_', '', regex=True
)
# split first part
df['Col2'] = df.Col1.str[:5]
# split second part
df['Col3'] = df.Col1.str[5:]
# add 19
df['Col2'] = df.Col2.replace({'(\d\d)': r'19\1'}, regex=True)
df['Col3'] = df.Col3.replace({'(\d\d)': r'19\1'}, regex=True)
這給了
Col1 Col2 Col3
0 Jan87Apr88 Jan1987 Apr1988
1 Feb88Apr88 Feb1988 Apr1988
2 mar89Apr89 mar1989 Apr1989
3 Apr90Apr91 Apr1990 Apr1991
4 May91Apr93 May1991 Apr1993
5 NaN NaN NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.