简体   繁体   中英

Creating DataFrame with list of dictionaries with np.array values

I have a list of dictionaries with values that are returned as numpy arrays (and which are often empty).

data=[{'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([])},
      {'width': array([ 0.64848222])},
      {'width': array([ 0.62241745])},
      {'width': array([ 0.76892571])},
      {'width': array([ 0.69913647])},
      {'width': array([ 0.7506934])},
      {'width': array([ 0.69087949])},
      {'width': array([ 0.65302866])},
      {'width': array([ 0.67267989])},
      {'width': array([ 0.63862089])}]

I would like to create a DataFame were the values are floats and not of numpy array dtype. Also I'd like to the empty arrays to be converted to NaN values.

I have tried using df=pd.DataFrame(data, dtype=float) which returns a DataFame with values as np.arrays as such:

               width
0                 []
1                 []
2                 []
3                 []
4                 []
5   [0.648482224582]
6   [0.622417447245]
7   [0.768925710479]
8   [0.699136467373]
9    [0.75069339816]
10  [0.690879488242]
11  [0.653028655088]
12  [0.672679885077]
13  [0.638620890633]

I've also tried recasting the df's values after creating it using df.values.astype(float) but get the following error: ValueError: setting an array element with a sequence.

The final output I am trying to get for the DataFame looks like:

               width
0                NaN
1                NaN
2                NaN
3                NaN
4                NaN
5     0.648482224582
6     0.622417447245
7     0.768925710479
8     0.699136467373
9      0.75069339816
10    0.690879488242
11    0.653028655088
12    0.672679885077
13    0.638620890633

After you've constructed the DataFrame from data , the only extra thing you need to do is:

df.width = df.width.str[0]

This works because we're just using the .str accessor to get the first element of each list. Empty lists don't have a first element so NaN is returned for those rows.

You end up with a column of float64 values:

       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

Note: if you want to display more decimal places, you'll need to adjust the float precision using pd.set_options .

Alternatively, you can process the list before you construct the DataFrame:

pd.DataFrame([x.get('width') for x in data], columns=['width'])

You can use a list comprehension to extract the data from the array in the dictionary. d['width'][0] will extract the first value from the array. if d['width'].shape[0] will evaluate to False if the array is empty, in which case None is inserted.

>>> pd.DataFrame([d['width'][0] if d['width'].shape[0] else None for d in data], 
                 columns=['width'])
       width
0        NaN
1        NaN
2        NaN
3        NaN
4        NaN
5   0.648482
6   0.622417
7   0.768926
8   0.699136
9   0.750693
10  0.690879
11  0.653029
12  0.672680
13  0.638621

Try this after getting the dataframe you posted:

def convert(x):
    if len(x) == 0:
            return np.nan
    else:
        return x[0]

 df['width'] = df['width'].apply(lambda x: convert(x))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM