简体   繁体   English

将列表映射到 1 和 0

[英]Mapping a list to 1s and 0s

I have two lists my_genre and list_of_genres.我有两个列表 my_genre 和 list_of_genres。 I want a function to check if my_list[index] is in list_of_genres and convert list_of_genres[index2] into a 1 if that is the case.我想要一个 function 来检查my_list[index]是否在list_of_genres中,如果是这种情况,将list_of_genres[index2]转换为1

list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']


my_genre = ['Action', 'Crime', 'Drama', 'Thriller']

expected result:预期结果:

[0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0]
data type : np.array

Ultimately I want to apply the function that does this to a pandas column that contains the genres.最终,我想将执行此操作的 function 应用于包含流派的 pandas 列。

Numpy isin is what you are looking for. Numpy isin 就是您要找的。

results = np.isin(list_of_genres, my_genre).astype(int)

It's the same for pandas. pandas 也是如此。

list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']

df = pd.DataFrame({"genres" : list_of_genres})
df["my_genre"]  = df["genres"].isin(my_genre).astype(int)
print(df)

A map() based solution producing a list :基于map()的解决方案生成list

ll = list(map(int, map(my_genre.__contains__, list_of_genres)))
print(ll)
# [0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

For the result to be numpy.ndarray() you could use np.fromiter() :要获得numpy.ndarray()的结果,您可以使用np.fromiter()

import numpy as np

arr = np.fromiter(map(my_genre.__contains__, list_of_genres), dtype=int)
print(arr)
# [0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0]

For larger inputs, np.in() should be the fastest.对于较大的输入, np.in()应该是最快的。 For inputs of this size, the map() approach is ~6 times faster than np.isin() , ~65 times faster than the pandas solution, and ~40% faster than a comprehension.对于这种大小的输入, map()方法比np.isin()快约 6 倍,比pandas解决方案快约 65 倍,比理解快约 40%。

%timeit np.isin(list_of_genres, my_genre).astype(int)                                                                                        
# 15.8 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.fromiter(map(my_genre.__contains__, list_of_genres), dtype=int)                                                                   
# 2.55 µs ± 27.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.fromiter((my_genre.__contains__(x) for x in list_of_genres), dtype=int)                                                           
# 4.14 µs ± 19.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df["genres"].isin(my_genre).astype(int)                                                                                              
# 167 µs ± 2.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

This can be further speed up by converting my_genre to a set prior to the application of the in / .__contains__ operator:这可以通过在应用in / .__contains__运算符之前将my_genre转换为set来进一步加快速度:

%timeit np.fromiter(map(set(my_genre).__contains__, list_of_genres), dtype=int)                                                              
# 1.9 µs ± 7.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Here it is, although your question is poorly formulated.在这里,尽管您的问题表述不当。

list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']

idx = [1 if g in my_genre else 0 for g in list_of_genres]

Output: Output:

Out[13]: [0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

If you want a numpy array, then simply convert it into one with numpy.asarray() .如果您想要一个 numpy 数组,则只需使用numpy.asarray()将其转换为一个。 And to apply it to a dataframe, simply change the elements my_genre and list_of_genres accordingly.要将其应用于 dataframe,只需相应地更改元素my_genrelist_of_genres

Try this,尝试这个,

>>> list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']


>>> my_genre = ['Action', 'Crime', 'Drama', 'Thriller']

Output: Output:

>>> [1 if el in my_genre else 0 for el in list_of_genres]

[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

If you want to use pandas as your tags suggest you can do如果您想按照标签建议的方式使用pandas

import pandas as pd
list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy',
                  'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller',
                  'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX',
                  'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']

my_genre = ['Action', 'Crime', 'Drama', 'Thriller']

df = pd.DataFrame({"genre": list_of_genres})

df["genre"].apply(lambda x: x in my_genre).astype(int)

# or even faster

df["genre"].isin(my_genre).astype(int)

This should do it as a nice little one liner:这应该是一个不错的小班轮:

list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']

result = np.array([int(my_genre.__contains__(n)) for n in list_of_genres])

Output: Output:

[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]

You can use list comprehension as 1 line solution您可以将列表理解用作 1 行解决方案

bool_list = [1 if item in my_genre else 0 for item in list_of_genres]

If you sort of new to this and don't quite understand list comprehension you can split it in a for loop如果您对此有点陌生并且不太了解列表理解,则可以将其拆分为 for 循环

bool_list =[]
for item in list_of_genres:
    if(item in my_genre):
        bool_list.append(1)
    else:
        bool_list.append(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM