[英]How to splitting column value in dataframe into multiple columns
我需要將 dataframe 列拆分為多個列,以確保每個單元格中只包含兩個值。 當前的 dataframe 看起來像:
Name | Number | Code |
..............................
Tom | 78797071| 0
Nick | | 89797071
Juli | | 57797074
June | 39797571| 0
Junw | | 23000000|
如果代碼包含 8 位數字,則將每列中的每兩位數字拆分,如果00出現在任何DIV中,則應將其標記為“不完整”
新的 dataframe 應如下所示:
Name | Number | Code | DIV|DIV2|DIV3|DIV4|Incomplete |
........................................................................
Tom | 78797071| 0 | 0 | 0| 0 | 0 |incomplete |
Nick | | 89797071| 89| 79 | 70 | 71 |complete |
Juli | | 57797074| 57| 79 | 70 | 74 |complete |
June | 39797571| 0 | 0| 0| 0 | 0 |complete |
Junw | | 23000000| 23| 00| 00 | 00 |incomplete |
試試這個快速修復。
import pandas as pd
import re
#data-preprocessing
data = {'Name': ['Tom','Nick','Juli','June','Junw'],'Code': ['0', '89797071', '57797074', '0', '23000000']}
#I omitted Number key in data
df = pd.DataFrame(data)
print(df)
#find patterns
pattern = r'(\d{2})(\d{2})(\d{2})(\d{2})'
zero_pattern = r'0{1,}'
split_data = []
for _ in df['Code'].items():
to_find = _[1]
splitted = re.findall(pattern, to_find)
if splitted:
temp = list(splitted[0])
if '00' in temp:
temp.append('incomplete')
else:
temp.append('complete')
split_data.append(temp)
zeromatch = re.match(zero_pattern, to_find)
if zeromatch:
split_data.append(['0','0','0','0','incomplete'])
#make right dataframe
col_name = ['DIV1','DIV2','DIV3','DIV4','Incomplete']
df2 = pd.DataFrame(split_data, columns=col_name)
df[col_name]= df2
print(df)
Output
Name Code
0 Tom 0
1 Nick 89797071
2 Juli 57797074
3 June 0
4 Junw 23000000
Name Code DIV1 DIV2 DIV3 DIV4 Incomplete
0 Tom 0 0 0 0 0 incomplete
1 Nick 89797071 89 79 70 71 complete
2 Juli 57797074 57 79 70 74 complete
3 June 0 0 0 0 0 incomplete
4 Junw 23000000 23 00 00 00 incomplete
您可以使用字符串函數 zfill 和 findall 來完成,如下所示
df.Code = df.Code.astype(np.str)
## zfill will pad string with 0 to make its lenght 8, findall will find each pair of digit
## explode will split list into rows (explode works with pandas 0.25 and above)
## reshape to make it 4 columns
arr = df.Code.str.zfill(8).str.findall(r"(\d\d)").explode().values.reshape(-1, 4)
## create new dataframe from arr with given column names
df2 = pd.DataFrame(arr, columns=[f"Div{i+1}" for i in range(arr.shape[1])])
## set "Incomplete" colum to incomplete if any column of row contains "00"
df2["Incomplete"] = np.where(np.any(arr == "00", axis=1), "incomplete", "complete")
pd.concat([df,df2], axis=1)
結果
Name Number Code Div1 Div2 Div3 Div4 Incomplete
0 Tom 78797071 0 00 00 00 00 incomplete
1 Nick 89797071 89 79 70 71 complete
2 Juli 57797074 57 79 70 74 complete
3 June 39797571 0 00 00 00 00 incomplete
4 Junw 23000000 23 00 00 00 incomplete
您可以使用str.findall("..")
拆分值,然后join
原始 df 上的列表。 使用apply
獲取完整/不完整狀態。
import pandas as pd
df = pd.DataFrame({"Name":["Tom","Nick","Juli","June","Junw"],
"Number":[78797071, 0, 0, 39797571, 0],
"Code":[0, 89797071, 57797074, 0, 23000000]})
df = df.join(pd.DataFrame(df["Code"].astype(str).str.findall("..").values.tolist()).add_prefix('DIV')).fillna("00")
df["Incomplete"] = df.iloc[:,3:7].apply(lambda row: "incomplete" if row.str.contains('00').any() else "complete", axis=1)
print (df)
#
Name Number Code DIV0 DIV1 DIV2 DIV3 Incomplete
0 Tom 78797071 0 00 00 00 00 incomplete
1 Nick 0 89797071 89 79 70 71 complete
2 Juli 0 57797074 57 79 70 74 complete
3 June 39797571 0 00 00 00 00 incomplete
4 Junw 0 23000000 23 00 00 00 incomplete
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.