簡體   English   中英

如何將 dataframe 中的列值拆分為多列

[英]How to splitting column value in dataframe into multiple columns

我需要將 dataframe 列拆分為多個列,以確保每個單元格中只包含兩個值。 當前的 dataframe 看起來像:

          Name     |  Number |  Code |
         ..............................
         Tom      | 78797071|       0
         Nick     |         | 89797071
         Juli     |         | 57797074
         June     | 39797571|       0
         Junw     |         | 23000000|

如果代碼包含 8 位數字,則將每列中的每兩位數字拆分,如果00出現在任何DIV中,則應將其標記為“不完整”

新的 dataframe 應如下所示:

     Name     |  Number |  Code |  DIV|DIV2|DIV3|DIV4|Incomplete  |
     ........................................................................
     Tom      | 78797071|       0 | 0 |   0|  0 |   0 |incomplete |
     Nick     |         | 89797071| 89| 79 | 70 | 71  |complete   |
     Juli     |         | 57797074| 57| 79 | 70 | 74  |complete   |
     June     | 39797571|       0 |  0|   0|  0 |   0 |complete   |
     Junw     |         | 23000000| 23|  00| 00 | 00  |incomplete |

試試這個快速修復。

import pandas as pd
import re

#data-preprocessing
data = {'Name': ['Tom','Nick','Juli','June','Junw'],'Code': ['0', '89797071', '57797074', '0', '23000000']}

#I omitted Number key in data

df = pd.DataFrame(data)

print(df)

#find patterns

pattern = r'(\d{2})(\d{2})(\d{2})(\d{2})'
zero_pattern = r'0{1,}'

split_data = []

for _ in df['Code'].items():

  to_find = _[1]

  splitted = re.findall(pattern, to_find)
  if splitted:
    temp = list(splitted[0])
    if '00' in temp:
      temp.append('incomplete')
    else:
      temp.append('complete')
    split_data.append(temp)

  zeromatch = re.match(zero_pattern, to_find)
  if zeromatch:
    split_data.append(['0','0','0','0','incomplete'])

#make right dataframe

col_name = ['DIV1','DIV2','DIV3','DIV4','Incomplete']

df2 = pd.DataFrame(split_data, columns=col_name)  

df[col_name]= df2

print(df)

Output

   Name      Code
0   Tom         0
1  Nick  89797071
2  Juli  57797074
3  June         0
4  Junw  23000000
   Name      Code DIV1 DIV2 DIV3 DIV4  Incomplete
0   Tom         0    0    0    0    0  incomplete
1  Nick  89797071   89   79   70   71    complete
2  Juli  57797074   57   79   70   74    complete
3  June         0    0    0    0    0  incomplete
4  Junw  23000000   23   00   00   00  incomplete

您可以使用字符串函數 zfill 和 findall 來完成,如下所示


df.Code = df.Code.astype(np.str)

## zfill will pad string with 0 to make its lenght 8, findall will find each pair of digit
## explode will split list into rows (explode works with pandas 0.25 and above)
## reshape to make it 4 columns
arr = df.Code.str.zfill(8).str.findall(r"(\d\d)").explode().values.reshape(-1, 4)

## create new dataframe from arr with given column names
df2 = pd.DataFrame(arr, columns=[f"Div{i+1}" for i in range(arr.shape[1])])

## set "Incomplete" colum to incomplete if any column of row contains "00"
df2["Incomplete"] = np.where(np.any(arr == "00", axis=1), "incomplete", "complete")

pd.concat([df,df2], axis=1)


結果

        Name    Number  Code    Div1    Div2    Div3    Div4    Incomplete
0   Tom 78797071    0   00  00  00  00  incomplete
1   Nick        89797071    89  79  70  71  complete
2   Juli        57797074    57  79  70  74  complete
3   June    39797571    0   00  00  00  00  incomplete
4   Junw        23000000    23  00  00  00  incomplete

您可以使用str.findall("..")拆分值,然后join原始 df 上的列表。 使用apply獲取完整/不完整狀態。

import pandas as pd

df = pd.DataFrame({"Name":["Tom","Nick","Juli","June","Junw"],
                   "Number":[78797071, 0, 0, 39797571, 0],
                   "Code":[0, 89797071, 57797074, 0, 23000000]})

df = df.join(pd.DataFrame(df["Code"].astype(str).str.findall("..").values.tolist()).add_prefix('DIV')).fillna("00")
df["Incomplete"] = df.iloc[:,3:7].apply(lambda row: "incomplete" if row.str.contains('00').any() else "complete", axis=1)

print (df)

#
   Name    Number      Code DIV0 DIV1 DIV2 DIV3  Incomplete
0   Tom  78797071         0   00   00   00   00  incomplete
1  Nick         0  89797071   89   79   70   71    complete
2  Juli         0  57797074   57   79   70   74    complete
3  June  39797571         0   00   00   00   00  incomplete
4  Junw         0  23000000   23   00   00   00  incomplete

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM