簡體   English   中英

使用 python 僅從字符串中拆分和解析特定字符

[英]Split and parse only specific characters from string using python

我正在嘗試從列中拆分和解析字符並將解析的數據提交到不同的列中。

首先需要從字符串中刪除不需要的數據,然后需要將其拆分並通過在年份之前添加前綴 19 將其保存到不同的列中。

輸入:

Col1
U_a65839_Jan87Apr88
U_b98652_Feb88Apr88
V_C56478_mar89Apr89
Q_d15634_Apr90Apr91
S_e15336_may91Apr93
NaN

Output

Col2       Col3
Jan1987   Apr1988
Feb1987   Apr1988
mar1987   Apr1988
Apr1987   Apr1988
may1987   Apr1988
  NaN        NaN

代碼片段,到目前為止我一直在嘗試什么:

df = pd.read_excel(open(r'C:\Users\Desktop\data1.xlsx', 'rb'), sheet_name='sheet1')

df['Col1'] = df['Col1'].partition("_")[2]

請建議如何執行它。

假設你有

col_1 = [
    'U_a65839_Jan87Apr88',
    'U_b98652_Feb88Apr88',
    'V_C56478_mar89Apr89',
    'Q_d15634_Apr90Apr91',
    'S_e15336_May91Apr93',
    np.nan
]
df = pd.DataFrame({
    'Col1': col_1
})

你可以

# remove unwanted data
df['Col1'] = df.Col1.str.replace(
    '.*_', '', regex=True
)
# split first part
df['Col2'] = df.Col1.str[:5]
# split second part
df['Col3'] = df.Col1.str[5:]
# add 19
df['Col2'] = df.Col2.replace({'(\d\d)': r'19\1'}, regex=True)
df['Col3'] = df.Col3.replace({'(\d\d)': r'19\1'}, regex=True)

這給了

         Col1     Col2     Col3
0  Jan87Apr88  Jan1987  Apr1988
1  Feb88Apr88  Feb1988  Apr1988
2  mar89Apr89  mar1989  Apr1989
3  Apr90Apr91  Apr1990  Apr1991
4  May91Apr93  May1991  Apr1993
5         NaN      NaN      NaN

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM