如何从字典字符串中打印特定键。

Question

I wanted to get a one hot data based on the number of elements in the list when using sklearn transform. 我想在使用sklearn转换时根据列表中元素的数量获得一个热门数据。

Code: 码：

from sklearn.feature_extraction.text import CountVectorizer
from itertools import chain


x = [['1234', '5678', '910', 'baba'], ['8', '1'], 
     [], ['9', '3'], [], ['7', '6'], [], []]
vector = CountVectorizer(token_pattern=r".+",  min_df=1, max_df=1.0, lowercase=False,
                 max_features=None)
vec = [xxx for xx in x for xxx in xx]
vector.fit(chain.from_iterable([vec]))
print(vector.get_feature_names())
new = []
for xx in x:
    new.append(vector.transform(xx))
for x in new:
    for xx in x.toarray():
        print(xx)

Current output: 电流输出：

['1', '1234', '3', '5678', '6', '7', '8', '9', '910', 'baba']
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 1 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]

My expected output: 我的预期输出：

['1', '1234', '3', '5678', '6', '7', '8', '9', '910', 'baba']
[0 1 0 1 0 0 0 0 1 1]
[1 0 0 0 0 0 1 0 0 0]
[0 0 1 0 0 0 0 1 0 0]
[0 0 0 0 1 1 0 0 0 0]

Is there a way to do it using my code? 有没有办法使用我的代码来做到这一点？ I have tried to change it many times but unfortunately to no luck. 我尝试过多次更改，但是很遗憾，没有运气。 Somehow, my brain stops to process anything now. 不知何故，我的大脑现在停止处理任何东西。

Answer 1

You shouldn't need explicit for loops for this task. 您不需要为此任务使用显式的for循环。 You can use MultiLabelBinarizer instead, also from the sklearn library. 您也可以从sklearn库中使用MultiLabelBinarizer 。 It doesn't handle empty lists, so just filter those out first. 它不会处理空列表，因此请先将其过滤掉。

Here's an example with Pandas: 这是熊猫的一个例子：

import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer

L = [['1234', '5678', '910', 'baba'], ['8', '1'], 
     [], ['9', '3'], [], ['7', '6'], [], []]

s = pd.Series(list(filter(None, L)))

mlb = MultiLabelBinarizer()

res = pd.DataFrame(mlb.fit_transform(s),
                   columns=mlb.classes_,
                   index=s.index)

print(res)

   1  1234  3  5678  6  7  8  9  910  baba
0  0     1  0     1  0  0  0  0    1     1
1  1     0  0     0  0  0  1  0    0     0
2  0     0  1     0  0  0  0  1    0     0
3  0     0  0     0  1  1  0  0    0     0

Answer 2

You can try of using intersect and np isin 您可以尝试使用相交和np isin

intersect function will give closed elements and isin will create boolean list 相交函数将给出封闭元素，而isin将创建布尔列表

mask = ['1', '1234', '3', '5678', '6', '7', '8', '9', '910', 'baba']
for xx in x:
    if len(xx)>1:
        print(np.isin(mask,np.array(list(set(xx).intersection(set(mask))))).astype(int))

Out: 出：

[0 1 0 1 0 0 0 0 1 1]
[1 0 0 0 0 0 1 0 0 0]
[0 0 1 0 0 0 0 1 0 0]
[0 0 0 0 1 1 0 0 0 0]

Flattening the lists 整理列表

#if you have big lists of elements you can flatten by 
sum(x,[])

Out: 出：

['1234', '5678', '910', 'baba', '8', '1', '9', '3', '7', '6']

Answer 3

For future readers: 对于未来的读者：

I somehow solved it with a SUPER NAIVE way. 我以一种超级天真的方式解决了它。

Here is the codes: 这是代码：

from sklearn.feature_extraction.text import CountVectorizer from itertools import chain 来自sklearn.feature_extraction.text从itertools导入链中导入CountVectorizer

x = [['1234', '5678', '910', 'baba'], ['8', '1'], 
     [], ['9', '3'], [], ['7', '6'], [], []]
vector = CountVectorizer(token_pattern=r"\S*\d+\S*",  min_df=1, max_df=1.0, lowercase=False,
                 max_features=None)
vec = [xxx for xx in x for xxx in xx]
vector.fit(chain.from_iterable([vec]))
print(vector.get_feature_names())
new = []
for xx in x:
    new.append(" ".join(xx))

neww = vector.transform(new)

print(neww.toarray())

如何从字典字符串中打印特定键。

问题描述

3 个解决方案

解决方案1
1 2018-10-01 08:51:39

解决方案2
1 2018-10-01 09:19:39

解决方案3
0 2018-10-01 09:17:34

如何从字典字符串中打印特定键。

问题描述

3 个解决方案

解决方案1 1 2018-10-01 08:51:39

解决方案2 1 2018-10-01 09:19:39

解决方案3 0 2018-10-01 09:17:34

解决方案1
1 2018-10-01 08:51:39

解决方案2
1 2018-10-01 09:19:39

解决方案3
0 2018-10-01 09:17:34