[英]Extract non- empty values from columns of a dataframe in python
This is a follow up of this question: Extract non- empty values from the regex array output in python 这是该问题的后续内容: 从python中的regex数组输出中提取非空值
I have a DF with columns "col" and "col1" of type 'numpy.ndarray' and looks like : 我有一个DF,列“ col”和“ col1”的类型为“ numpy.ndarray”,看起来像:
col col1
[[5, , , ,]] [qwe,ret,der,po]
[[, 4, , ,][, , 5, ]] [fgk,hfrt]
[] []
[[, , , 9]] [test]
I want my output as: 我希望我的输出为:
col col1
5 qwe,ret,der,po
5 fgk,hfrt
0 NOT FOUND
9 test
Please note column "col", second row has maximum of the two entries in the output. 请注意列“ col”,第二行具有输出中两个条目的最大值。 I tried the solution provided in the above link but its giving ValueError "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
我尝试了上面链接中提供的解决方案,但给出ValueError“具有多个元素的数组的真值不明确。请使用a.any()或a.all()”
Thanks 谢谢
Edit: Dictionary form of my DF with column "col": 编辑:我的DF与列“ col”的字典形式:
{'col': {0: array([['5', '', '', '', '', '']],
dtype='|S1'), 1: array([], dtype=float64), 2: array([], dtype=float64), 3: array([], dtype=float64), 4: array([], dtype=float64), 5: array([['8', '', '', '', '', '']],
dtype='|S1'), 6: array([], dtype=float64), 7: array([], dtype=float64), 8: array([], dtype=float64), 9: array([], dtype=float64), 10: array([], dtype=float64), 11: array([['', '8', '', '', '', '']],
dtype='|S1'), 12: array([], dtype=float64), 13: array([], dtype=float64), 14: array([], dtype=float64), 15: array([['7', '', '', '', '', '']],
dtype='|S1'), 16: array([], dtype=float64)}}
Try the following: 请尝试以下操作:
import pandas as pd
def parse_nested_max(xss):
return max(
(max((int(x) for x in xs if x), default=0) for xs in xss),
default=0
)
df['col'] = df.col.apply(parse_nested_max)
df['col1'] = df.col1.apply(lambda s: ','.join(s) or 'NOT FOUND')
This assumes that the first column is a 2-dim array of type string, and the second is 1-dim array of type string. 假定第一列是字符串类型的2维数组,第二列是字符串类型的1维数组。
For the first column, do the following: 对于第一列,请执行以下操作:
''
elements and convert rest to int
''
元素并将rest转换为int
max
with the convention that max([]) == 0
max
与约定max([]) == 0
default=0
to account for possibility of emptiness like in third row of your df
. default=0
来解决空缺的可能性,如df
第三行。 For the second column, exploit the fact that bool(','.join([])) == False
. 对于第二列,利用
bool(','.join([])) == False
的事实。
Finally a tip: you will have better feedback if your dataframe is easy to recreate. 最后一个提示:如果您的数据框易于重新创建,您将获得更好的反馈。 Try using
df.to_dict()
and embedding the output in your source when you define df
. 定义
df
时,请尝试使用df.to_dict()
并将输出嵌入源中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.