如何遍历多个numpy数组并将项目从一个数组追加到另一个具有相同ID的数组？

Question

What if I have the following data, test_df['review_id'] that contains the id of the dataframe. 如果我有以下数据test_df['review_id']包含数据test_df['review_id']的ID，该test_df['review_id'] 。 I need to pair each of them with data from other arrays. 我需要将它们与其他数组中的数据配对。 I am going to have a code like the following. 我将有如下代码。

def classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test):
    clf = MultinomialNB()

    # TRAIN THE CLASSIFIER WITH AVAILABLE TRAINING DATA
    clf.fit(X_train_vectorized, y_train)

    y_pred_class = clf.predict(X_test_vectorized)

    return y_pred_class

for i in range(0, n_loop):
    train_df, test_df = train_test_split(df, test_size=0.3)
    ....
    nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)

As you can see above, in each iteration I am going to get a new set of nb_y which is a numpy array. 如您在上面看到的，在每次迭代中，我将获得一组新的nb_y ，它是一个numpy数组。 I am also going to have different sets of test_df and train_df (which are randomly chosen by the function above). 我还将拥有不同的test_df和train_df集（它们是由上面的函数随机选择的）。 I want to pair each value of nb_y from each iteration to id that matches test_df['review_id'] . 我想将每个迭代中的每个nb_y值与匹配test_df['review_id'] id配对。

With the following code, I can get the id of test_df side by side with the value from nb_y . 通过以下代码，我可以与test_df的值并排获得nb_y 。

for f, b in zip(test_df['review_id'], nb_y):
    print(f, b)

Result: 结果：

Now, what I want is, from the result above, I append the values of nb_y from the next iterations to their corresponding ids. 现在，我想要的是，根据上面的结果，我将下一次迭代中的nb_y值附加到其对应的id中。

I hope this is not too confusing, I will try to expand more if my question is not clear enough. 我希望这不会太令人困惑，如果我的问题不够清楚，我将尝试扩大范围。 Thanks in advance. 提前致谢。

Answer 1

I am not sure if I understand the problem correctly and how the rest of your code works but I assume the following code might do what you need. 我不确定我是否正确理解问题以及其余代码的工作方式，但是我认为以下代码可以满足您的需求。 Let me know if it works or if there is something wrong with the answer. 让我知道它是否有效或答案是否有问题。

dictionary = {}
for i in range(0, n_loop):
    train_df, test_df = train_test_split(df, test_size=0.3)
    ....
    nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)
    id = test_df['review_id']
    if not id in dictionary.keys():
        dictionary[id] = [nb_y]
    else:
        dictionary[id].append(nb_y)

Answer 2

After referring to this and this , I finally came up with my own solution. 指的是经过这个和这个，我终于想出了自己的解决方案。 I turned the code above into something like this. 我把上面的代码变成了这样的东西。

def classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test):
    clf = MultinomialNB()

    # TRAIN THE CLASSIFIER WITH AVAILABLE TRAINING DATA
    clf.fit(X_train_vectorized, y_train)

    y_pred_class = clf.predict(X_test_vectorized)

    return y_pred_class


nb_y_list = []

for i in range(0, n_loop):
    train_df, test_df = train_test_split(df, test_size=0.3)
    ....
    nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)

    nb_y_list.extend([list(x) for x in zip(test_df['review_id'],nb_y)])

dd = defaultdict(list)
for key, val in nb_y_list:
     dd[key].append(val)
     print(dd)

Basically, I made an empty list called nb_y_list first. 基本上，我首先创建了一个名为nb_y_list的空列表。 Then for each iteration, I zip the id from test_df['review_id'] to be parallel with the value from nb_y , and extend them to the previous nb_y_list . 然后，对于每一次迭代，我zip从ID test_df['review_id']为与从值并行nb_y ，并将它们扩展到先前nb_y_list 。 After all the loops are finished, I will get the complete list that I now I will need to convert to dictionary using defaultdict() . 在完成所有循环之后，我将获得完整的列表，现在我需要使用defaultdict()将其转换为字典。

如何遍历多个numpy数组并将项目从一个数组追加到另一个具有相同ID的数组？

问题描述

2 个解决方案

解决方案1
0 2018-05-10 15:10:30

解决方案2
0 2018-05-10 17:03:08

如何遍历多个numpy数组并将项目从一个数组追加到另一个具有相同ID的数组？

问题描述

2 个解决方案

解决方案1 0 2018-05-10 15:10:30

解决方案2 0 2018-05-10 17:03:08

解决方案1
0 2018-05-10 15:10:30

解决方案2
0 2018-05-10 17:03:08