简体   繁体   中英

Python: Flatten a list of lists of single floats stored in a dataframe

I load some data to a pandas dataframe using pyodbc. The database contains list of lists of a single float number. Given dataframe name is df , type(df.iloc[0][0] , type(df.iloc[0][0][0] give list as an output when type(df.iloc[0][0][0][0] give float . I need to flatten those lists down in order to get just the numbers and in the end to just have a list of numbers instead a list of lists.

Just for visualization here is what is saved in df.iloc[0][0] :

[[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [32.09984], [0.0], [0.0], [0.0], [0.0], [0.0], [0.40704], [0.40704], [32.09984], [32.061440000000005], [32.048640000000006], [32.01024], [0.49152000000000007], [0.0], [0.00256], [0.00512], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]

Any ideas? Thanks

If you don't mind working with a numpy array you can just do:

df.iloc[0][0] = numpy.array(df.iloc[0][0]).flatten()

or here the code to do it for a whole column

df["Column"] = df["Column"].apply(lambda x : np.array(x).flatten())

and if your data needs to be a list later on:

df["Column"] = df["Column"].apply(lambda x : list(np.array(x).flatten()))

For all columns:

for col in df.columns:
   if col not in ["ColumnThatShouldNotBeTransformed1", "ColumnThatShouldNotBeTransformed2"]:
        df[col] = df[col].apply(lambda x : np.array(x).flatten())

If all sublists contain single element:

>>> x = [[0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [32.09984], [0.0], [0.0], [0.0], [0.0], [0.0], [0.40704], [0.40704], [32.09984], [32.061440000000005], [32.048640000000006], [32.01024], [0.49152000000000007], [0.0], [0.00256], [0.00512], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]

>>> list(map(lambda a:a[0],x))

Otherwise:

>>> from itertools import chain

>>> list(chain(*x))

Replace x with df.iloc[0][0]

From pandas 0.25 on you could try df.explode()

mcve:

# ltest = [[0], [314], [42]]
# df = pd.DataFrame([[ltest, ltest, ltest], [ltest, ltest, ltest]], columns=['A', 'B', 'C'])

#                     A                   B                   C
# 0  [[0], [314], [42]]  [[0], [314], [42]]  [[0], [314], [42]]
# 1  [[0], [314], [42]]  [[0], [314], [42]]  [[0], [314], [42]]

approach:
create new dataframe:

df_new = pd.DataFrame()
for c in df.columns:
    df_new[c] = df[c].explode().str.get(0)

#      A    B    C
# 0    0    0    0
# 0  314  314  314
# 0   42   42   42
# 1    0    0    0
# 1  314  314  314
# 1   42   42   42

index not unique anymore -> create multiindex:

ct = df_new.groupby(df_new.index).cumcount()

# 0    0
# 0    1
# 0    2
# 1    0
# 1    1
# 1    2
# dtype: int64

df_new.index = pd.MultiIndex.from_arrays([df_new.index, ct])

result:

#        A    B    C
# 0 0    0    0    0
#   1  314  314  314
#   2   42   42   42
# 1 0    0    0    0
#   1  314  314  314
#   2   42   42   42

or with former sub list elements as rows as a matter of taste:

df_new.unstack()

#    A           B           C         
#    0    1   2  0    1   2  0    1   2
# 0  0  314  42  0  314  42  0  314  42
# 1  0  314  42  0  314  42  0  314  42

Now you can index these data quite convenient as usual with Pandas, just note that the multiindex needs a tuple:

df_new.loc[0, 1]

# A    314
# B    314
# C    314
# Name: (0, 1), dtype: int64


df_new.loc[(0, 2), 'B']

# 42


df_new.loc[(0, slice(None)), 'B']

# 0  0      0
#    1    314
#    2     42
# Name: 1, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM