简体   繁体   English

从python中的正则表达式数组输出中提取非空值

[英]Extract non- empty values from the regex array output in python

I have a column of type numpy.ndarray which looks like: 我有一个numpy.ndarray类型的列,看起来像:

         col
    ['','','5','']
    ['','8']
    ['6','','']
    ['7']
    []
    ['5']

I want the ouput like this : 我想要这样的输出:

         col
          5
          8
          6
          7
          0
          5

How can I do this in python.Any help is highly appreciated. 我该如何在python中做到这一点。

To convert the data to numeric values you could use: 要将数据转换为数值,可以使用:

import numpy as np
import pandas as pd
data = list(map(np.array, [ ['','','5',''], ['','8'], ['6','',''], ['7'], [], ['5']]))
df = pd.DataFrame({'col': data})
df['col'] = pd.to_numeric(df['col'].str.join('')).fillna(0).astype(int)
print(df)

yields 产量

   col
0    5
1    8
2    6
3    7
4    0
5    5

To convert the data to strings use: 要将数据转换为字符串,请使用:

df['col'] = df['col'].str.join('').replace('', '0')

The result looks the same, but the dtype of the column is object since the values are strings. 结果看起来相同,但是列的dtype是object因为值是字符串。


If there is more than one number in some rows and you wish to pick the largest, then you'll have to loop through each item in each row, convert each string to a numeric value and take the max: 如果某些行中有多个数字,并且希望选择最大的数字,则必须遍历每一行中的每个项目,将每个字符串转换为数值并取最大值:

import numpy as np
import pandas as pd
data = list(map(np.array, [ ['','','5','6'], ['','8'], ['6','',''], ['7'], [], ['5']]))
df = pd.DataFrame({'col': data})
df['col'] = [max([int(xi) if xi else 0 for xi in x] or [0]) for x in df['col']]
print(df)

yields 产量

   col
0    6   # <-- note  ['','','5','6'] was converted to 6
1    8
2    6
3    7
4    0
5    5

For versions of pandas prior to 0.17, you could use df.convert_objects instead: 对于0.17之前的熊猫版本,可以改用df.convert_objects

import numpy as np
import pandas as pd
data = list(map(np.array, [ ['','','5',''], ['','8'], ['6','',''], ['7'], [], ['5']]))
df = pd.DataFrame({'col': data})
df['col'] = df['col'].str.join('').replace('', '0')
df = df.convert_objects(convert_numeric=True)

I'll leave you with this : 我把这个留给你:

>>> l=['', '5', '', '']
>>> l = [x for x in l if not len(x) == 0]
>>> l
>>> ['5']

You can do the same thing using lambda and filter 您可以使用lambda和filter做同样的事情

>>> l
['', '1', '']
>>> l = filter(lambda x: not len(x)==0, l)
>>> l
['1']

The next step would be iterating through the rows of the array and implementing one of these two ideas. 下一步将是遍历数组的行并实现这两个想法之一。

Someone shows how this is done here: Iterating over Numpy matrix rows to apply a function each? 有人在这里显示了如何完成此操作: 遍历Numpy矩阵行以分别应用一个函数?

edit: maybe this is down-voted, but I made it on purpose to not give the final code. 编辑:也许这是不赞成的,但我故意这样做是为了不给出最终代码。

     xn = array([['', '', '5', ''], ['', '8'], ['6', '', ''], ['7'], [], ['5']],
    dtype=object)

        In [20]: for a in x:
   ....:     if len(a)==0:
   ....:         print 0
   ....:     else:
   ....:         for b in a:
   ....:             if b:
   ....:                 print b
   ....:
5
8
6
7
0
5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM