[英]How to loop through multiple numpy arrays and append the item from one array to another with same id?
What if I have the following data, test_df['review_id']
that contains the id of the dataframe. 如果我有以下数据
test_df['review_id']
包含数据test_df['review_id']
的ID,该test_df['review_id']
。 I need to pair each of them with data from other arrays. 我需要将它们与其他数组中的数据配对。 I am going to have a code like the following.
我将有如下代码。
def classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test):
clf = MultinomialNB()
# TRAIN THE CLASSIFIER WITH AVAILABLE TRAINING DATA
clf.fit(X_train_vectorized, y_train)
y_pred_class = clf.predict(X_test_vectorized)
return y_pred_class
for i in range(0, n_loop):
train_df, test_df = train_test_split(df, test_size=0.3)
....
nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)
As you can see above, in each iteration I am going to get a new set of nb_y
which is a numpy array. 如您在上面看到的,在每次迭代中,我将获得一组新的
nb_y
,它是一个numpy数组。 I am also going to have different sets of test_df
and train_df
(which are randomly chosen by the function above). 我还将拥有不同的
test_df
和train_df
集(它们是由上面的函数随机选择的)。 I want to pair each value of nb_y
from each iteration to id
that matches test_df['review_id']
. 我想将每个迭代中的每个
nb_y
值与匹配test_df['review_id']
id
配对。
With the following code, I can get the id of test_df
side by side with the value from nb_y
. 通过以下代码,我可以与
test_df
的值并排获得nb_y
。
for f, b in zip(test_df['review_id'], nb_y):
print(f, b)
Result: 结果:
17377 5.0
18505 5.0
24825 1.0
16032 5.0
23721 1.0
18008 5.0
Now, what I want is, from the result above, I append the values of nb_y
from the next iterations to their corresponding ids. 现在,我想要的是,根据上面的结果,我将下一次迭代中的
nb_y
值附加到其对应的id中。
I hope this is not too confusing, I will try to expand more if my question is not clear enough. 我希望这不会太令人困惑,如果我的问题不够清楚,我将尝试扩大范围。 Thanks in advance.
提前致谢。
I am not sure if I understand the problem correctly and how the rest of your code works but I assume the following code might do what you need. 我不确定我是否正确理解问题以及其余代码的工作方式,但是我认为以下代码可以满足您的需求。 Let me know if it works or if there is something wrong with the answer.
让我知道它是否有效或答案是否有问题。
dictionary = {}
for i in range(0, n_loop):
train_df, test_df = train_test_split(df, test_size=0.3)
....
nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)
id = test_df['review_id']
if not id in dictionary.keys():
dictionary[id] = [nb_y]
else:
dictionary[id].append(nb_y)
After referring to this and this , I finally came up with my own solution. 指的是经过这个和这个 ,我终于想出了自己的解决方案。 I turned the code above into something like this.
我把上面的代码变成了这样的东西。
def classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test):
clf = MultinomialNB()
# TRAIN THE CLASSIFIER WITH AVAILABLE TRAINING DATA
clf.fit(X_train_vectorized, y_train)
y_pred_class = clf.predict(X_test_vectorized)
return y_pred_class
nb_y_list = []
for i in range(0, n_loop):
train_df, test_df = train_test_split(df, test_size=0.3)
....
nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)
nb_y_list.extend([list(x) for x in zip(test_df['review_id'],nb_y)])
dd = defaultdict(list)
for key, val in nb_y_list:
dd[key].append(val)
print(dd)
Basically, I made an empty list called nb_y_list
first. 基本上,我首先创建了一个名为
nb_y_list
的空列表。 Then for each iteration, I zip
the id from test_df['review_id']
to be parallel with the value from nb_y
, and extend them to the previous nb_y_list
. 然后,对于每一次迭代,我
zip
从ID test_df['review_id']
为与从值并行nb_y
,并将它们扩展到先前nb_y_list
。 After all the loops are finished, I will get the complete list that I now I will need to convert to dictionary using defaultdict()
. 在完成所有循环之后,我将获得完整的列表,现在我需要使用
defaultdict()
将其转换为字典。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.