I load some data to a pandas dataframe using pyodbc. The database contains list of lists of a single float number. Given dataframe name is df
, type(df.iloc[0][0]
, type(df.iloc[0][0][0]
give list
as an output when type(df.iloc[0][0][0][0]
give float
. I need to flatten those lists down in order to get just the numbers and in the end to just have a list of numbers instead a list of lists.
Just for visualization here is what is saved in df.iloc[0][0]
:
[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [32.09984], [0.0], [0.0], [0.0], [0.0], [0.0], [0.40704], [0.40704], [32.09984], [32.061440000000005], [32.048640000000006], [32.01024], [0.49152000000000007], [0.0], [0.00256], [0.00512], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]
Any ideas? Thanks
If you don't mind working with a numpy array you can just do:
df.iloc[0][0] = numpy.array(df.iloc[0][0]).flatten()
or here the code to do it for a whole column
df["Column"] = df["Column"].apply(lambda x : np.array(x).flatten())
and if your data needs to be a list later on:
df["Column"] = df["Column"].apply(lambda x : list(np.array(x).flatten()))
For all columns:
for col in df.columns:
if col not in ["ColumnThatShouldNotBeTransformed1", "ColumnThatShouldNotBeTransformed2"]:
df[col] = df[col].apply(lambda x : np.array(x).flatten())
If all sublists contain single element:
>>> x = [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [32.09984], [0.0], [0.0], [0.0], [0.0], [0.0], [0.40704], [0.40704], [32.09984], [32.061440000000005], [32.048640000000006], [32.01024], [0.49152000000000007], [0.0], [0.00256], [0.00512], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]
>>> list(map(lambda a:a[0],x))
Otherwise:
>>> from itertools import chain
>>> list(chain(*x))
Replace x
with df.iloc[0][0]
From pandas 0.25 on you could try df.explode()
mcve:
# ltest = [[0], [314], [42]]
# df = pd.DataFrame([[ltest, ltest, ltest], [ltest, ltest, ltest]], columns=['A', 'B', 'C'])
# A B C
# 0 [[0], [314], [42]] [[0], [314], [42]] [[0], [314], [42]]
# 1 [[0], [314], [42]] [[0], [314], [42]] [[0], [314], [42]]
approach:
create new dataframe:
df_new = pd.DataFrame()
for c in df.columns:
df_new[c] = df[c].explode().str.get(0)
# A B C
# 0 0 0 0
# 0 314 314 314
# 0 42 42 42
# 1 0 0 0
# 1 314 314 314
# 1 42 42 42
index not unique anymore -> create multiindex:
ct = df_new.groupby(df_new.index).cumcount()
# 0 0
# 0 1
# 0 2
# 1 0
# 1 1
# 1 2
# dtype: int64
df_new.index = pd.MultiIndex.from_arrays([df_new.index, ct])
result:
# A B C
# 0 0 0 0 0
# 1 314 314 314
# 2 42 42 42
# 1 0 0 0 0
# 1 314 314 314
# 2 42 42 42
or with former sub list elements as rows as a matter of taste:
df_new.unstack()
# A B C
# 0 1 2 0 1 2 0 1 2
# 0 0 314 42 0 314 42 0 314 42
# 1 0 314 42 0 314 42 0 314 42
Now you can index these data quite convenient as usual with Pandas, just note that the multiindex needs a tuple:
df_new.loc[0, 1]
# A 314
# B 314
# C 314
# Name: (0, 1), dtype: int64
df_new.loc[(0, 2), 'B']
# 42
df_new.loc[(0, slice(None)), 'B']
# 0 0 0
# 1 314
# 2 42
# Name: 1, dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.