简体   繁体   中英

Python numpy: Convert string in to numpy array

I have following String that I have put together:

v1fColor = '2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,0,0,0,3,6,0,0,0,0,0,0,0,0,0,7,5,0,0,0,2,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,1,1,0,0,0,0,0,0,0,0,0,6,6,0,0,0,2,3'

I am treating it as a vector: Long story short its a forecolor of an image histogram:

I have the following lambda function to calculate cosine similarity of two images, So I tried to convert this is to numpy.array but I failed:

Here is my lambda function

import numpy as NP
import numpy.linalg as LA
cx = lambda a, b : round(NP.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)

So I tried the following to convert this string as a numpy array:

v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)

But I ended up getting following error:

    v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)
ValueError: invalid literal for float(): 2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,

您必须先用逗号分隔字符串:

NP.array(v1fColor.split(","), dtype=NP.uint8)

You can do this:

lst = v1fColor.split(',')  #create a list of strings, splitting on the commas.
v1fColor = NP.array( lst, dtype=NP.uint8 ) #numpy converts the strings.  Nifty!

or more concisely:

v1fColor = NP.array( v1fColor.split(','), dtype=NP.uint8 )

Note that it is a little more customary to do:

import numpy as np

compared to import numpy as NP

EDIT

Just today I learned about the function numpy.fromstring which could also be used to solve this problem:

NP.fromstring( "1,2,3" , sep="," , dtype=NP.uint8 )

You can do this without using python string methods -- try numpy.fromstring :

>>> numpy.fromstring(v1fColor, dtype='uint8', sep=',')
array([ 2,  4, 14,  5,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 12,  4,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 15,  6,  0,  0,
        0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0, 20,  9,  0,  0,  0,
        2,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0, 13,  6,  0,  0,  0,  1,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 10,  8,  0,  0,  0,  1,  2,
        0,  0,  0,  0,  0,  0,  0,  0,  0, 17, 17,  0,  0,  0,  3,  6,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  7,  5,  0,  0,  0,  2,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  4,  3,  0,  0,  0,  1,  1,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  6,  6,  0,  0,  0,  2,  3], dtype=uint8)

I am writing this answer so if for any future references: I am not sure what is the correct solution in this case but I think What @David Robinson initially publish was the correct answer due to one reason: Cosine Similarity values can not be greater than one and when I use NP.array(v1fColor.split(","), dtype=NP.uint8) option I get strage values which are above 1.0 for cosine similarity between two vectors.

So I wrote a simple sample code to try out:

import numpy as np
import numpy.linalg as LA

def testFunction():
    value1 = '2,3,0,80,125,15,5,0,0,0,0,0,0,0,0,0,0,0,0,0,2,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    value2 = '2,137,0,4,96,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    cx = lambda a, b : round(np.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)
    #v1fColor = np.array(map(int,value1.split(',')))
    #v2fColor =  np.array(map(int,value2.split(',')))
    v1fColor = np.array( value1.split(','), dtype=np.uint8 )
    v2fColor = np.array( value2.split(','), dtype=np.uint8 )
    print v1fColor
    print v2fColor
    cosineValue = cx(v1fColor, v2fColor)
    print cosineValue

if __name__ == '__main__':
    testFunction()

if you run this code you should get the following output: 在此输入图像描述

Not lets un commented two lines that and run the code with the David's Initial Solution:

v1fColor = np.array(map(int,value1.split(',')))
v2fColor =  np.array(map(int,value2.split(','))) 

Keep in mind as you see above Cosine Similarity Value came up above 1.0 but when we use the map function and use do the int casting we get the following value which is the correct value:

在此输入图像描述

Luckily I was plotting the values that I was initially getting and some of the cosine values came above 1.0 and I took the outputs of these vectors and manually typed it in python console, and send it via my lambda function and got the correct answer so I was very confuse. Then I wrote the test script to see whats going on and glad I caught this issue. I am not a python expert to exactly tell what is going on in two methods to give two different answers. But I leave that to either @David Robinson or @mgilson.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM