简体   繁体   中英

pandas python: np.array_split(df, x) throws an error: 'DataFrame' object has no attribute 'size'

when I use the code below

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,1))  
df = np.array_split(df, 4)

FYI - df here is a

<class 'pandas.core.frame.DataFrame'>

so why do I get the following error:

AttributeError: 'DataFrame' object has no attribute 'size'

Im using the latest pandas version 0.15, on windows 7, with anaconda and eclipse.

Thanks

Try using np.array_split(df[0], 4) instead.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,1))
df = np.array_split(df[0], 4)
print df

Result:

[0    1.210245
1    0.311729
2    0.044975
Name: 0, dtype: float64, 
3   -1.202211
4    0.579064
5   -1.615657
Name: 0, dtype: float64, 
6    1.491537
7    0.498112
Name: 0, dtype: float64, 
8    1.372771
9    0.147200
Name: 0, dtype: float64]
[Finished in 0.5s]

You can define a function like this one:

def split_dataframe(df, n):
    """
    Helper function that splits a DataFrame to a list of DataFrames of size n

    :param df: pd.DataFrame
    :param n: int
    :return: list of pd.DataFrame
    """
    n = int(n)
    df_size = len(df)
    batches = range(0, (df_size/n + 1) * n, n)
    return [df.iloc[i:i+n] for i in batches if i!=df_size] 

Which will give the following result:

df = pd.DataFrame(np.random.randn(10,1))  
dfs = split_dataframe(df, 4)
print "Num of DataFrames=%d, each of size %s" % (len(dfs), [len(i) for i in dfs])
>>>> Num of DataFrames=3, each of size [4, 4, 2]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM