简体   繁体   English

将字符串字典转换为 numpy arrays 的字典

[英]Convert a dictionary of strings to a dictionary of numpy arrays

I have a dictionary structured similar to below.我有一个结构类似于下面的字典。

test_dict = {1: 'I run fast', 2: 'She runs', 3: 'How are you?'}

What I'm trying to do is convert all the strings to 4x4 numpy arrays where each word is in it's own row and each letter occupies one cell of the array, populated with blanks for lines that wouldn't fill the entire row a whole row of blanks for sentences that are less than 4 words long.我想要做的是将所有字符串转换为 4x4 numpy arrays 其中每个单词都在它自己的行中并且每个字母占据数组的一个单元格,用空格填充那些不会填满整行的行少于 4 个词长的句子的空白。 I also need to be able to tie it back to the ID, so the result needs to be in some format that can allow for referencing each array by it's ID later on.我还需要能够将它与 ID 联系起来,因此结果需要采用某种格式,以便稍后可以通过它的 ID 引用每个数组。

I don't know of any pre built functions that can handle something like this, but I would be happy to be wrong.我不知道有任何预建函数可以处理这样的事情,但我很乐意犯错。 For now I've been trying to write a loop to handle it.现在我一直在尝试编写一个循环来处理它。 Below is obviously incomplete because I'm stuck at the point of creating an array in the structure I would like.下面显然是不完整的,因为我停留在以我想要的结构创建数组的点上。

for k in test_dict.keys():
    sentence = test_dict.getvalues(k)
    sentence_ascii = [ord(c) for c in sentence]
    sentence_array = np.array(sentence_ascii)

Is this what you mean?你是这个意思吗?

{
    key: np.array([list(word.ljust(4)) for word in val.split()])
    for key, val in test_dict.items()
}

output: output:

{1: array([['I', ' ', ' ', ' '],
           ['r', 'u', 'n', ' '],
           ['f', 'a', 's', 't']], dtype='<U1'),
 2: array([['S', 'h', 'e', ' '],
           ['r', 'u', 'n', 's']], dtype='<U1'),
 3: array([['H', 'o', 'w', ' '],
           ['a', 'r', 'e', ' '],
           ['y', 'o', 'u', '?']], dtype='<U1')}

This'll ensure you have blank rows for sentences less than four words long.这将确保少于四个单词的句子有空白行。

new_dict = {i+1:np.empty((4,4),dtype='str') for i in range(len(test_dict))}
for k,v in test_dict.items():
    new_dict[k][:len(v.split())] = np.array([list(s)+['']*(4-len(s)) for s in v.split()])
    new_dict[k] = new_dict[k].view(np.int32)

You can use this to call your arrays using field 'ID':您可以使用它通过字段“ID”调用您的 arrays:

dt=[('ID', '<i4'), ('sentences', object)]
new_dict = np.empty(len(test_dict), dtype=dt)
for i, (k, v) in enumerate(test_dict.items()):
  new_dict[i] = (k, np.pad(np.array([list("{:<4}".format(w)) for w in v.split(' ')]).view(np.int32), [(0,4-len(v.split(' '))),(0,0)]))

example output:例如 output:

print(new_dict[new_dict['ID']==2]['sentences'])

[array([[ 83, 104, 101,  32],
   [114, 117, 110, 115],
   [  0,   0,   0,   0],
   [  0,   0,   0,   0]], dtype=int32)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM