[英]Mapping a list to 1s and 0s
I have two lists my_genre and list_of_genres.我有两个列表 my_genre 和 list_of_genres。 I want a function to check if
my_list[index]
is in list_of_genres
and convert list_of_genres[index2]
into a 1
if that is the case.我想要一个 function 来检查
my_list[index]
是否在list_of_genres
中,如果是这种情况,将list_of_genres[index2]
转换为1
。
list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']
expected result:预期结果:
[0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0]
data type : np.array
Ultimately I want to apply the function that does this to a pandas column that contains the genres.最终,我想将执行此操作的 function 应用于包含流派的 pandas 列。
Numpy isin is what you are looking for. Numpy isin 就是您要找的。
results = np.isin(list_of_genres, my_genre).astype(int)
It's the same for pandas. pandas 也是如此。
list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']
df = pd.DataFrame({"genres" : list_of_genres})
df["my_genre"] = df["genres"].isin(my_genre).astype(int)
print(df)
A map()
based solution producing a list
:基于
map()
的解决方案生成list
:
ll = list(map(int, map(my_genre.__contains__, list_of_genres)))
print(ll)
# [0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
For the result to be numpy.ndarray()
you could use np.fromiter()
:要获得
numpy.ndarray()
的结果,您可以使用np.fromiter()
:
import numpy as np
arr = np.fromiter(map(my_genre.__contains__, list_of_genres), dtype=int)
print(arr)
# [0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0]
For larger inputs, np.in()
should be the fastest.对于较大的输入,
np.in()
应该是最快的。 For inputs of this size, the map()
approach is ~6 times faster than np.isin()
, ~65 times faster than the pandas
solution, and ~40% faster than a comprehension.对于这种大小的输入,
map()
方法比np.isin()
快约 6 倍,比pandas
解决方案快约 65 倍,比理解快约 40%。
%timeit np.isin(list_of_genres, my_genre).astype(int)
# 15.8 µs ± 385 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.fromiter(map(my_genre.__contains__, list_of_genres), dtype=int)
# 2.55 µs ± 27.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.fromiter((my_genre.__contains__(x) for x in list_of_genres), dtype=int)
# 4.14 µs ± 19.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit df["genres"].isin(my_genre).astype(int)
# 167 µs ± 2.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
This can be further speed up by converting my_genre
to a set
prior to the application of the in
/ .__contains__
operator:这可以通过在应用
in
/ .__contains__
运算符之前将my_genre
转换为set
来进一步加快速度:
%timeit np.fromiter(map(set(my_genre).__contains__, list_of_genres), dtype=int)
# 1.9 µs ± 7.17 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Here it is, although your question is poorly formulated.在这里,尽管您的问题表述不当。
list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']
idx = [1 if g in my_genre else 0 for g in list_of_genres]
Output: Output:
Out[13]: [0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
If you want a numpy array, then simply convert it into one with numpy.asarray()
.如果您想要一个 numpy 数组,则只需使用
numpy.asarray()
将其转换为一个。 And to apply it to a dataframe, simply change the elements my_genre
and list_of_genres
accordingly.要将其应用于 dataframe,只需相应地更改元素
my_genre
和list_of_genres
。
Try this,尝试这个,
>>> list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
>>> my_genre = ['Action', 'Crime', 'Drama', 'Thriller']
Output: Output:
>>> [1 if el in my_genre else 0 for el in list_of_genres]
[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
If you want to use pandas
as your tags suggest you can do如果您想按照标签建议的方式使用
pandas
import pandas as pd
list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy',
'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller',
'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX',
'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']
df = pd.DataFrame({"genre": list_of_genres})
df["genre"].apply(lambda x: x in my_genre).astype(int)
# or even faster
df["genre"].isin(my_genre).astype(int)
This should do it as a nice little one liner:这应该是一个不错的小班轮:
list_of_genres = ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy', 'Drama', 'Romance', 'Action', 'Thriller', 'Sci-Fi', 'Crime', 'Horror', 'Mystery', 'IMAX', 'Documentary', 'War', 'Musical', 'Western', 'Film-Noir']
my_genre = ['Action', 'Crime', 'Drama', 'Thriller']
result = np.array([int(my_genre.__contains__(n)) for n in list_of_genres])
Output: Output:
[0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
You can use list comprehension as 1 line solution您可以将列表理解用作 1 行解决方案
bool_list = [1 if item in my_genre else 0 for item in list_of_genres]
If you sort of new to this and don't quite understand list comprehension you can split it in a for loop如果您对此有点陌生并且不太了解列表理解,则可以将其拆分为 for 循环
bool_list =[]
for item in list_of_genres:
if(item in my_genre):
bool_list.append(1)
else:
bool_list.append(0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.