簡體   English   中英

簡化將數據幀拆分為多個數據幀的過程

[英]simplify splitting a dataframe to several dataframes

所以我有一些具有不同行數的數據幀( df0df1df2 )。 我想將任何行數超過 30 的數據幀拆分為僅包含 30 行的幾個數據幀。 例如,我的數據幀df0有 156 行,然后我會將這個數據幀分成幾個數據幀,如下所示:

if len(df0) > 30:
        df0_A = df0[0:30]
        df0_B = df0[31:60]
        df0_C = df0[61:90]
        df0_D = df0[91:120]
        df0_E = df0[121:150]
        df0_F = df0[151:180]
    else:
        df0= df0

這段代碼的問題是我需要為下一個代碼徹底重復代碼多次,如下所示:

df0= pd.DataFrame(df0)
df0_A = pd.DataFrame(df0_A)
df0_B = pd.DataFrame(df0_B)
df0_C = pd.DataFrame(df0_C)
df0_D = pd.DataFrame(df0_D)
df0_E = pd.DataFrame(df0_E)
df0_F = pd.DataFrame(df0_F)

df0= df0.to_string(header=False,
                              index=False,
                              index_names=False).split('\n')
df0_A = df0_A.to_string(header=False,
                              index=False,
                              index_names=False).split('\n')
df0_B = df0_B.to_string(header=False,
                              index=False,
                              index_names=False).split('\n')
df0_C = df0_C.to_string(header=False,
                              index=False,
                              index_names=False).split('\n')
df0_D = df0_D.to_string(header=False,
                              index=False,
                              index_names=False).split('\n')
df0_E = df0_E.to_string(header=False,
                              index=False,
                              index_names=False).split('\n')
df0_F = idUGS0_F.to_string(header=False,
                              index=False,
                              index_names=False).split('\n')

df0= [','.join(ele.split()) for ele in df0]
df0_A = [','.join(ele.split()) for ele in df0_A]
df0_B = [','.join(ele.split()) for ele in df0_B]
df0_C = [','.join(ele.split()) for ele in df0_C]
df0_D = [','.join(ele.split()) for ele in df0_D]
df0_E = [','.join(ele.split()) for ele in df0_E]
df0_F = [','.join(ele.split()) for ele in df0_F]

現在想象我有十個數據幀,我需要將每個數據幀分成五個數據幀。 然后我需要制作相同的代碼50次! 我對 Python 很陌生。 那么,任何人都可以幫助我簡化此代碼,也許是簡單的for 循環 謝謝

假設您有一列用於識別

def split_df(idf, idcol, nsize):
  g = idf.groupby(idcol)
  # Compute the size for each value of identification column
  size = g.size()

  dflist = []
  for _id,_idcount in size.iteritems():
    if _idcount > nsize:
      # print(_id, ' = ', _idcount)
      idx     = idf[ idf[idcol].eq(_id) ].index
      # print(idx)
      # lets split the array into equal parts of `nsize`
      # e.g. [1,2,3,4,5] with nsize = 2 will split into ([1,2], [3,4], [5])
      ilist   = np.array_split(idx, round(idx.shape[0]/nsize + 0.5))
      dflist += ilist

  return [idf.loc[idx].copy(deep=True) for idx in dflist]

df = pd.DataFrame(data=np.hstack((np.random.choice(np.arange(1,3), 10).reshape(10, -1), np.random.rand(10,3))), columns=['id', 'a', 'b', 'c'])

df = df.astype({'id': np.int64})

split(df, 'id', 2)

這是一個很大的問題,你可以使用這個(這里的數據是DataFrame):

# Create subsets of size 30 for the DataFrame
subsets = list(range(0, len(data), 30))

# Create start cutoffs for subsets of the DataFrame
start_cutoff = subsets

# Create end cutoffs for subsets of the DataFrame
end_cutoff = subsets[1:] +  [len(data)]

# Zip the start cutoffs and end cutoffs into a List of Cutoffs
cutoffs = list(zip(start_cutoff, end_cutoff))

# List containing Splitted Dataframes
list_dfs = [data.iloc[cutoff[0]: cutoff[-1]] for cutoff in cutoffs]

# convert list to string DFs
string_dfs = [df.to_string(header=False, index=False, index_names=False).split('\n') for df in list_dfs]

final_df_list = [','.join(ele.split()) for string_df in string_dfs for ele in string_df]

現在您可以通過以下方式訪問數據幀:

 print(final_df_list[0])
 print(final_df_list[1])

您可能可以更自動化一點,但這應該足夠了!

import copy
import numpy as np
df0 = pd.DataFrame({'Test' : np.random.randint(100000,999999,size=180)})
len(df0)
if len(df0) > 30:
    df_dict = {}
    x=0
    y=30
    for df_letter in ['A','B','C','D','E','F']:
        df_name = f'df0_{df_letter}'    
        df_dict[df_name] = copy.deepcopy(df_letter)
        df_dict[df_name] = pd.DataFrame(df0[x:y]).to_string(header=False, index=False, index_names=False).split('\n ')
        df_dict[df_name] = [','.join(ele.split()) for ele in df_dict[df_name]]
        x += 30
        y += 30
    df_name
else:
    df0
for df in df_dict:
    print(df)
    print('--------------------------------------------------------------------')
    print(f'length: {len(df_dict[df])}')
    print('--------------------------------------------------------------------')
    print(df_dict[df])
    print('--------------------------------------------------------------------')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM