I need a function that given a data frame and a number num
constructs a data frame with num
rows such that every row has the following value: - for columns with string values we sample a value from a column in original table - for columns with floats or ints we find mean value
Here is my code
def rows_aggr(df, num):
dataframe = None
for i in range(0, num):
row = None
for cname in df.columns.values:
column = df[cname]
dfcol = Series.to_frame(column)
if column.dtype != np.number:
item = dfcol.sample(n=1)
else:
item = dfcol.mean(axis=1)
if row is None:
row = item
else:
row = pd.concat([row, item], axis=1)
if dataframe is None:
dataframe = row
else:
dataframe = pd.concat([dataframe, row], axis=0)
return dataframe
for some reason rows contain nan values and exceed the num
... and this code does not seem to work right. If you know a better way accomplishing what I need - I would be happy to know.
for
df = pd.DataFrame({'col1':list('abcdef'),'col2':range(6)}) and num=3
we would get smth like
c, 2.5
f, 2.5
b, 2.5
assuming and c, f, b
were randomly picked
Thank you!
One error seems that the condition column.dtype != np.number
does not work. Then there is a problem with index alignment when you do pd.concat([row, item], axis=1)
, item
contains an index number that is not always the same and this add rows with Nan
in row
. Here is another way to do it.
SETUP
df = pd.DataFrame({'col1':list('abcdef'),'col2':list('ijklmn'),
'col3':range(6),'col4':np.arange(10,16)/1.5})
print (df)
col1 col2 col3 col4
0 a i 0 6.666667
1 b j 1 7.333333
2 c k 2 8.000000
3 d l 3 8.666667
4 e m 4 9.333333
5 f n 5 10.000000
you can use select_dtypes
to check if a column is not numeric, and create the dataframe with a dictionary comprehension like:
def rows_aggr(df, num):
list_col_notnumeric = df.select_dtypes(exclude=[np.number]).columns
return pd.DataFrame({col: df[col].sample(num).values
if col in list_col_notnumeric
else df[col].mean()
for col in df.columns})
print (rows_aggr(df, 3))
col1 col2 col3 col4
0 d i 2.5 8.333333
1 a n 2.5 8.333333
2 c j 2.5 8.333333
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.