简体   繁体   English

如何遍历多个numpy数组并将项目从一个数组追加到另一个具有相同ID的数组?

[英]How to loop through multiple numpy arrays and append the item from one array to another with same id?

What if I have the following data, test_df['review_id'] that contains the id of the dataframe. 如果我有以下数据test_df['review_id']包含数据test_df['review_id']的ID,该test_df['review_id'] I need to pair each of them with data from other arrays. 我需要将它们与其他数组中的数据配对。 I am going to have a code like the following. 我将有如下代码。

def classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test):
    clf = MultinomialNB()

    # TRAIN THE CLASSIFIER WITH AVAILABLE TRAINING DATA
    clf.fit(X_train_vectorized, y_train)

    y_pred_class = clf.predict(X_test_vectorized)

    return y_pred_class

for i in range(0, n_loop):
    train_df, test_df = train_test_split(df, test_size=0.3)
    ....
    nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)

As you can see above, in each iteration I am going to get a new set of nb_y which is a numpy array. 如您在上面看到的,在每次迭代中,我将获得一组新的nb_y ,它是一个numpy数组。 I am also going to have different sets of test_df and train_df (which are randomly chosen by the function above). 我还将拥有不同的test_dftrain_df集(它们是由上面的函数随机选择的)。 I want to pair each value of nb_y from each iteration to id that matches test_df['review_id'] . 我想将每个迭代中的每个nb_y值与匹配test_df['review_id'] id配对。

With the following code, I can get the id of test_df side by side with the value from nb_y . 通过以下代码,我可以与test_df的值并排获得nb_y

for f, b in zip(test_df['review_id'], nb_y):
    print(f, b)

Result: 结果:

17377 5.0
18505 5.0
24825 1.0
16032 5.0
23721 1.0
18008 5.0

Now, what I want is, from the result above, I append the values of nb_y from the next iterations to their corresponding ids. 现在,我想要的是,根据上面的结果,我将下一次迭代中的nb_y值附加到其对应的id中。

I hope this is not too confusing, I will try to expand more if my question is not clear enough. 我希望这不会太令人困惑,如果我的问题不够清楚,我将尝试扩大范围。 Thanks in advance. 提前致谢。

I am not sure if I understand the problem correctly and how the rest of your code works but I assume the following code might do what you need. 我不确定我是否正确理解问题以及其余代码的工作方式,但是我认为以下代码可以满足您的需求。 Let me know if it works or if there is something wrong with the answer. 让我知道它是否有效或答案是否有问题。

dictionary = {}
for i in range(0, n_loop):
    train_df, test_df = train_test_split(df, test_size=0.3)
    ....
    nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)
    id = test_df['review_id']
    if not id in dictionary.keys():
        dictionary[id] = [nb_y]
    else:
        dictionary[id].append(nb_y)

After referring to this and this , I finally came up with my own solution. 指的是经过这个这个 ,我终于想出了自己的解决方案。 I turned the code above into something like this. 我把上面的代码变成了这样的东西。

def classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test):
    clf = MultinomialNB()

    # TRAIN THE CLASSIFIER WITH AVAILABLE TRAINING DATA
    clf.fit(X_train_vectorized, y_train)

    y_pred_class = clf.predict(X_test_vectorized)

    return y_pred_class


nb_y_list = []

for i in range(0, n_loop):
    train_df, test_df = train_test_split(df, test_size=0.3)
    ....
    nb_y = classify_nb_report(X_train_vectorized, y_train, X_test_vectorized, y_test)

    nb_y_list.extend([list(x) for x in zip(test_df['review_id'],nb_y)])

dd = defaultdict(list)
for key, val in nb_y_list:
     dd[key].append(val)
     print(dd)

Basically, I made an empty list called nb_y_list first. 基本上,我首先创建了一个名为nb_y_list的空列表。 Then for each iteration, I zip the id from test_df['review_id'] to be parallel with the value from nb_y , and extend them to the previous nb_y_list . 然后,对于每一次迭代,我zip从ID test_df['review_id']为与从值并行nb_y ,并将它们扩展到先前nb_y_list After all the loops are finished, I will get the complete list that I now I will need to convert to dictionary using defaultdict() . 在完成所有循环之后,我将获得完整的列表,现在我需要使用defaultdict()将其转换为字典。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM