I have a numpy 2D array like so:
[['a', '(junk, b)', '(junk, c)'],
['d', '(junk, e)', '(junk, f)'],
['g', '(junk, h)', '(junk, i)']]
As you can see some of the values have a parenthesis around them, I'd like to remove these extra values such that my new array is:
[['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', 'i']]
I have a regex to get the match group of the data I want to capture but is there a clean way within numpy to apply a regex to certain values at certain positions and return my new array with the unwanted values replaced?
You can use a nested list comprehension to strip the items with str.strip()
method :
>>> np.array([[x.strip('()') for x in i] for i in l])
array([['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', 'i']],
dtype='|S1')
Based on your edit if you have extra words in your array you can use regex to match the single characters :
>>> l=[['a', '(junk, b)', '(junk, c)'],
... ['d', '(junk, e)', '(junk, f)'],
... ['g', '(junk, h)', '(junk, i)']]
>>>
>>> np.array([[re.search(r'\b[a-z]\b',x).group() for x in i] for i in l])
array([['a', 'b', 'c'],
['d', 'e', 'f'],
['g', 'h', 'i']],
dtype='|S1')
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.