[英]Check list values and assign it to a dictionary key in python
I have a list of words as below. 我有一个单词列表如下。
mylist = ['cat', 'yellow', 'car', 'red', 'green', 'jeep', 'rat','lorry']
I also have a list of lists for each essay in the dataset that contain values for the 'mylist' as given in the examples below (ie, if 'mylist' word appears in essay I make it 1, otherwise 0). 我还为数据集中的每篇文章提供了一个列表列表,其中包含以下示例中给出的“ mylist”值(即,如果“ mylist”一词出现在论文中,我将其设为1,否则设为0)。
[[0,1,0,0,0,1,0,1], [1,0,0,0,0,1,0,0]]
In other words, 换一种说法,
[0,1,0,0,0,1,0,1] says that this only has values 'yellow', 'jeep', 'lorry'
Now I have a dictionary of categories as below. 现在,我有以下类别的词典。
mydictionary = {'colour': ['red', 'yellow', 'green'], 'animal': ['rat','cat'],
'vehicle': ['car', 'jeep']}
Now by using 'mydictionary' key values I want to transform the list of lists as follows (That is, if one or more values of the 'mylist' is 1, I mark the key as 1, else 0). 现在,通过使用“ mydictionary”键值,我希望按以下方式转换列表列表(即,如果“ mylist”的一个或多个值是1,则将键标记为1,否则标记为0)。
[[1,0,1], [0,1,0]]
In other words, 换一种说法,
[1,0,1] says that;
1 - one or more '1's for elements in 'colours'
0 - no elements in 'animals'
0 - one or more '1's for elements in 'vehicles'
So my output should be a list of lists as mentioned above -> [[1,0,1], [0,1,0]] 所以我的输出应该是如上所述的列表列表-> [[1,0,1],[0,1,0]]
I am new to pandas, Hence, I am interested in knowing if this is possible to do using pandas dataframes. 我是熊猫的新手,因此,我想知道使用熊猫数据框是否有可能做到这一点。
Setup 设定
a = np.array(['cat', 'yellow', 'car', 'red', 'green', 'jeep', 'rat','lorry'])
b = np.array([[0,1,0,0,0,1,0,1], [1,0,0,0,0,1,0,0]], dtype=bool)
mydictionary = {
'colour': ['red', 'yellow', 'green'],
'animal': ['rat','cat'],
'vehicle': ['car', 'jeep']
}
Solution 解
Some minor additional setup 一些小的附加设置
I just needed to get an array of sets in the correct order. 我只需要按正确的顺序获取一组数组即可。
o = ['colour', 'animal', 'vehicle']
s = pd.Series(mydictionary).apply(set).loc[o]
s
colour {green, red, yellow}
animal {cat, rat}
vehicle {jeep, car}
dtype: object
Use set
intersection with numpy
broadcasting 将
set
交集与numpy
广播一起使用
(s.values & [[set(a[l])] for l in b]).astype(bool).astype(int)
array([[1, 0, 1],
[0, 1, 1]])
Additional Explanation 附加说明
If I'm to use numpy
broadcasting and I already have a series with values 如果我要使用
numpy
广播,并且已经有一系列值
s.values
[{'green', 'red', 'yellow'} {'cat', 'rat'} {'jeep', 'car'}]
Then I need a 2-D array with the other sets 然后我需要一个二维数组和其他集合
[[set(a[l])] for l in b]
[[{'jeep', 'lorry', 'yellow'}], [{'cat', 'jeep'}]]
When I broadcast the &
operation 当我广播
&
操作时
s.values & [[set(a[l])] for l in b]
[[{'yellow'} set() {'jeep'}]
[set() {'cat'} {'jeep'}]]
Conveniently, empty sets evaluate to False
and non-empty sets to True
in a bool
context. 方便地,在
bool
上下文中,空集的值为False
,非空集的值为True
。 Follow that with an int
context and we have our solution. 在具有
int
上下文的情况下进行操作,我们将提供解决方案。
(s.values & [[set(a[l])] for l in b]).astype(bool).astype(int)
array([[1, 0, 1],
[0, 1, 1]])
I think you need: 我认为您需要:
mylist = ['cat', 'yellow', 'car', 'red', 'green', 'jeep', 'rat','lorry']
a = [[1,1,0,0,0,1,0,1], [1,0,0,0,0,1,0,0]]
mydictionary = {'colour': ['red', 'yellow', 'green'], 'animal': ['rat','cat', 'lorry'],
'vehicle': ['car', 'jeep']}
#order of output categories
cols = ['colour','animal','vehicle']
df = pd.DataFrame(a, columns=mylist)
d = {k: oldk for oldk, oldv in mydictionary.items() for k in oldv}
df = df.rename(columns=d).groupby(axis=1, level=0).max().reindex(columns=cols)
print (df)
colour animal vehicle
0 1 1 1
1 0 1 1
L = df.values.tolist()
print (L)
[[1, 1, 1], [0, 1, 1]]
Here is another approach without pandas: 这是没有熊猫的另一种方法:
list_of_list = <whatever you have>
for i, list in enumerate(list_of_list):
# temp_list will hold lists such [yellow, jeep, lorry]
temp_list = [mylist[j] for j in range(len(list)) if list[j] == 1]
for t, item in enumerate(temp_list):
for k, key in enumerate(mydictionary.keys()):
if item in mydictionary[key]:
temp_list[t] = k
# now override the list of list
list_of_list[i] = temp_list[i]
I didn't run the code. 我没有运行代码。 So, there might be some minor bugs.
因此,可能会有一些小错误。 But, I am hoping you get the idea
但是,我希望你能想到
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.