从python中数据框的列中提取非空值

Question

This is a follow up of this question: Extract non- empty values from the regex array output in python 这是该问题的后续内容：从python中的regex数组输出中提取非空值

I have a DF with columns "col" and "col1" of type 'numpy.ndarray' and looks like : 我有一个DF，列“ col”和“ col1”的类型为“ numpy.ndarray”，看起来像：

       col                         col1
   [[5, , , ,]]             [qwe,ret,der,po]
   [[, 4, , ,][, , 5, ]]       [fgk,hfrt]
        []                           []
   [[, , , 9]]                  [test]

I want my output as: 我希望我的输出为：

      col  col1
       5  qwe,ret,der,po
       5  fgk,hfrt
       0  NOT FOUND 
       9  test

Please note column "col", second row has maximum of the two entries in the output. 请注意列“ col”，第二行具有输出中两个条目的最大值。 I tried the solution provided in the above link but its giving ValueError "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" 我尝试了上面链接中提供的解决方案，但给出ValueError“具有多个元素的数组的真值不明确。请使用a.any（）或a.all（）”

Thanks 谢谢

Edit: Dictionary form of my DF with column "col": 编辑：我的DF与列“ col”的字典形式：

  {'col': {0: array([['5', '', '', '', '', '']],
  dtype='|S1'), 1: array([], dtype=float64), 2: array([], dtype=float64), 3: array([], dtype=float64), 4: array([], dtype=float64), 5: array([['8', '', '', '', '', '']],
  dtype='|S1'), 6: array([], dtype=float64), 7: array([], dtype=float64), 8: array([], dtype=float64), 9: array([], dtype=float64), 10: array([], dtype=float64), 11: array([['', '8', '', '', '', '']],
  dtype='|S1'), 12: array([], dtype=float64), 13: array([], dtype=float64), 14: array([], dtype=float64), 15: array([['7', '', '', '', '', '']],
  dtype='|S1'), 16: array([], dtype=float64)}}

Answer 1

Try the following: 请尝试以下操作：

import pandas as pd


def parse_nested_max(xss):
    return max(
        (max((int(x) for x in xs if x), default=0) for xs in xss),
        default=0
    )


df['col'] = df.col.apply(parse_nested_max)
df['col1'] = df.col1.apply(lambda s: ','.join(s) or 'NOT FOUND')

This assumes that the first column is a 2-dim array of type string, and the second is 1-dim array of type string. 假定第一列是字符串类型的2维数组，第二列是字符串类型的1维数组。

For the first column, do the following: 对于第一列，请执行以下操作：

For each subarray, drop '' elements and convert rest to int 对于每个子数组，删除''元素并将rest转换为int
For each subarray, compute max with the convention that max([]) == 0 对于每个子阵列，计算max与约定max([]) == 0
Finally, this gives a list of integers, so simply take the max; 最后，它给出了一个整数列表，因此只需取最大值即可； use default=0 to account for possibility of emptiness like in third row of your df . 使用default=0来解决空缺的可能性，如df第三行。

For the second column, exploit the fact that bool(','.join([])) == False . 对于第二列，利用bool(','.join([])) == False的事实。

Finally a tip: you will have better feedback if your dataframe is easy to recreate. 最后一个提示：如果您的数据框易于重新创建，您将获得更好的反馈。 Try using df.to_dict() and embedding the output in your source when you define df . 定义df时，请尝试使用df.to_dict()并将输出嵌入源中。

从python中数据框的列中提取非空值

问题描述

1 个解决方案

解决方案1
0 2016-05-09 12:44:06

从python中数据框的列中提取非空值

问题描述

1 个解决方案

解决方案1 0 2016-05-09 12:44:06

解决方案1
0 2016-05-09 12:44:06