[英]how to apply a function that returns a dataframe to each row of another dataframe
我有以下 function 占用一行并返回 dataframe ld_coefs_df_long
,我不确定 ZC1C425268E68385D1AB5074C17A94F14 语法是否正确:
def apply_calc_coef (row):
device_id = row['deviceId']
sound_filename = row['filename']
sound_fullpath = os.path.join(basepath, device_id, sound_filename)
offset = row['offset']
telemetry_id = row['telemetry_id']
fs,data = wavfile.read(sound_fullpath)
ld_coefs_df = calc_ld_coefs(data, fs, offset=offset, win_len=2000, ndeg=12)
ld_coefs_df_long = pd.melt(ld_coefs_df, id_vars=['time'] , var_name='series_type' )
ld_coefs_df_long['telemetry_id'] = telemetry_id
ld_coefs_df_long.rename(columns={"time": "t_seconds"})
return ld_coefs_df_long // is a dataframe
i need to apply this function to each row of a dataframe called ds_telemetry_pd
and stack the results ( which is a dataframe for each row) together to have anew dataframe, using following syntax:
ds_telemetry_pd.apply(lambda x: apply_calc_coef(x))
显然我的方向不是很好,因为我的 function 或我应用 t 的方式或两者都不正确。
编辑:当我使用ds_telemetry_pd.apply(apply_calc_coef, axis=1)
我得到以下错误:
`cannot copy sequence with size 3 to array axis with dimension 14400`
ValueError Traceback (most recent call last)
<command-3171623043129929> in <module>()
----> 1 ds_telemetry_pd.apply(apply_calc_coef, axis=1)
/databricks/python/lib/python3.5/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4150 if reduce is None:
4151 reduce = True
-> 4152 return self._apply_standard(f, axis, reduce=reduce)
4153 else:
4154 return self._apply_broadcast(f, axis)
/databricks/python/lib/python3.5/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4263 index = None
4264
-> 4265 result = self._constructor(data=results, index=index)
4266 result.columns = res_index
4267
/databricks/python/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
264 dtype=dtype, copy=copy)
265 elif isinstance(data, dict):
--> 266 mgr = self._init_dict(data, index, columns, dtype=dtype)
267 elif isinstance(data, ma.MaskedArray):
268 import numpy.ma.mrecords as mrecords
/databricks/python/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
400 arrays = [data[k] for k in keys]
401
--> 402 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
403
404 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/databricks/python/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5401
5402 # don't force copy because getting jammed in an ndarray anyway
-> 5403 arrays = _homogenize(arrays, index, dtype)
5404
5405 # from BlockManager perspective
/databricks/python/lib/python3.5/site-packages/pandas/core/frame.py in _homogenize(data, index, dtype)
5712 v = lib.fast_multiget(v, oindex.values, default=NA)
5713 v = _sanitize_array(v, index, dtype=dtype, copy=False,
-> 5714 raise_cast_failure=False)
5715
5716 homogenized.append(v)
/databricks/python/lib/python3.5/site-packages/pandas/core/series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
2950 raise Exception('Data must be 1-dimensional')
2951 else:
-> 2952 subarr = _asarray_tuplesafe(data, dtype=dtype)
2953
2954 # This is to prevent mixed-type Series getting all casted to
/databricks/python/lib/python3.5/site-packages/pandas/core/common.py in _asarray_tuplesafe(values, dtype)
390 except ValueError:
391 # we have a list-of-list
--> 392 result[:] = [tuple(x) for x in values]
393
394 return result
ValueError: cannot copy sequence with size 3 to array axis with dimension 14400 ```
您的结果将是一系列数据帧。 为了将它们转换为一个 DataFrame,您可以执行以下操作:
pd.concat(ds_telemetry_pd.apply(apply_calc_coef, axis=1).tolist(), ignore_index=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.