when I use the code below
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10,1))
df = np.array_split(df, 4)
FYI - df here is a
<class 'pandas.core.frame.DataFrame'>
so why do I get the following error:
AttributeError: 'DataFrame' object has no attribute 'size'
Im using the latest pandas version 0.15, on windows 7, with anaconda and eclipse.
Thanks
Try using np.array_split(df[0], 4)
instead.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10,1))
df = np.array_split(df[0], 4)
print df
Result:
[0 1.210245
1 0.311729
2 0.044975
Name: 0, dtype: float64,
3 -1.202211
4 0.579064
5 -1.615657
Name: 0, dtype: float64,
6 1.491537
7 0.498112
Name: 0, dtype: float64,
8 1.372771
9 0.147200
Name: 0, dtype: float64]
[Finished in 0.5s]
You can define a function like this one:
def split_dataframe(df, n):
"""
Helper function that splits a DataFrame to a list of DataFrames of size n
:param df: pd.DataFrame
:param n: int
:return: list of pd.DataFrame
"""
n = int(n)
df_size = len(df)
batches = range(0, (df_size/n + 1) * n, n)
return [df.iloc[i:i+n] for i in batches if i!=df_size]
Which will give the following result:
df = pd.DataFrame(np.random.randn(10,1))
dfs = split_dataframe(df, 4)
print "Num of DataFrames=%d, each of size %s" % (len(dfs), [len(i) for i in dfs])
>>>> Num of DataFrames=3, each of size [4, 4, 2]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.