简体   繁体   English

如何通过遍历 Python 中的 dataframe 中的每一行来将计算值存储在新列中?

[英]How to store a calculated value in new column by iterating through each row in a dataframe in Python?

The dataframe I am working with looks like this:我正在使用的 dataframe 如下所示:

  vid2               FStart FEnd cap2                                               VDuration  COS  cap1
0 -_aaMGK6GGw_57_61  0      3    A man grabbed a boy from his collar and threw ...  4          2    A man and woman are yelling at a young boy and...
1 -_aaMGK6GGw_57_61  3      4    A lady is waking up a man lying on a chair and...  4          2    A man and woman are yelling at a young boy and...
2 -_hbPLsZvvo_5_8    0      1    A white dog is barking and a caption is writte...  3          2    a dog barking and cooking with her master in t...
  ...                ...    ...  ...                                                ...        ...  ...

I am trying to calculate a similarity score between the two columns cap1 and cap2 .我正在尝试计算cap1cap2两列之间的相似度得分。 However, I want to create a new column FSim that stores this similarity score for each row.但是,我想创建一个新列FSim来存储每一行的相似度分数。

The code I have implemented till now is:到目前为止我已经实现的代码是:

#The function that calculates the similarity score
def get_cosine_similarity(feature_vec_1, feature_vec_2):    
    return cosine_similarity(feature_vec_1.reshape(1, -1), feature_vec_2.reshape(1, -1))[0][0]


for i, row in merged.iterrows():
    captions = []
    captions.append(row['cap1'])
    captions.append(row['cap2'])

    for c in range(len(captions)):
        captions[c] = pre_process(captions[c])
        captions[c] = lemmatize_sentence(captions[c])

    feature_vectors = tfidf_vectorizer.transform(captions)

    fsims = get_cosine_similarity(feature_vectors[0], feature_vectors[1])
    merged['fsim'] = fsim

But I am getting the same similarity scored stored for each row like this:但是我得到了为每一行存储的相同相似度,如下所示:

       fsim  
0  0.054464  
1  0.054464  
2  0.054464  
3  0.054464  
4  0.054464

Same value for all the rows.所有行的值相同。

How to get properly stored the score for each row?如何正确存储每一行的分数?

How about this?这个怎么样? (I'm assuming the DataFrame you have first is merged ) (我假设您首先拥有的 DataFrame 已merged

def preproc_and_lemmatize(x):
  v1 = pre_process(x)
  return lemmatize_sentence(v1)

def calc_sim(x, y):
  x2 = preproc_and_lemmatize(x)
  y2 = preproc_and_lemmatize(y)
  feature_vectors = tfidf_vectorize.transform([x2, y2])
  return get_cosine_similarity(feature_vectors[0], feature_vectors[1])

merged['fsim'] = [
  calc_sim(x, y) for x, y in zip(merged['cap1'], merged['cap2'])
]

If you prefer to less edit, this will work.如果您喜欢较少的编辑,这将起作用。

merged["fsim"] = 0
for i, row in merged.iterrows():
    captions = []
    captions.append(row['cap1'])
    captions.append(row['cap2'])

    for c in range(len(captions)):
        captions[c] = pre_process(captions[c])
        captions[c] = lemmatize_sentence(captions[c])

    feature_vectors = tfidf_vectorizer.transform(captions)

    fsims = get_cosine_similarity(feature_vectors[0], feature_vectors[1])
    merged['fsim'].iloc[i] = fsims

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在数据框中拆分一列并将每个值存储为新行(以熊猫为单位)? - How to split a column in a dataframe and store each value as a new row (in pandas)? 遍历 dataframe 中列的每一行中的列表 - Iterating through a list in each row of a column in a dataframe 遍历数据框中选定列的行以“清理”每一行 - Iterating through rows of selected column in dataframe to “clean” each row 如何为特定列的每个不同值选择一行并合并以在 Python 中形成新的数据框? - How to select one row for each distinct value for a particular column and merge to form a new dataframe in Python? 如何将 dataframe 的每一行转换为新列使用 concat in python - How to convert each row of a dataframe to new column use concat in python 如何通过遍历行来预测 dataframe 中的每一行? - how can I predict for each row in the dataframe by iterating through the rows? 如何通过遍历DataFrame中的每一行来转换纬度和经度? - How to Convert Latitude and Longitude by iterating through each row in DataFrame? Python 迭代 Pandas DataFrame 并添加使用 geopy.geocoders Nominatim 性能建议计算的新值 - Python iterating through Pandas DataFrame and adding new values calculated with geopy.geocoders Nominatim performance suggestions 如何通过搜索现有列值而不迭代在数据框中追加新行? - How to append a new row in a dataframe by searching for an existing column value without iterating? 遍历数据框中的列以添加到 Python pandas 中的 Total Value 列 - Iterating through columns in a dataframe to add to a Total Value column in Python pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM