简体   繁体   English

使用 Pandas 数据框嵌套 for 循环

[英]nested for loops with pandas dataframe

I am looping through a dataframe column of headlines (sp500news) and comparing against a dataframe of company names (co_names_df).我正在遍历标题的数据框列 (sp500news) 并与公司名称的数据框 (co_names_df) 进行比较。 I am trying to update the frequency each time a company name appears in a headline.每次公司名称出现在标题中时,我都试图更新频率。

My current code is below and is not updating the frequency columns.我当前的代码在下面并且没有更新频率列。 Is there a cleaner, faster implementation - maybe without the for loops?有没有更干净、更快的实现——也许没有 for 循环?

for title in sp500news['title']:
    for string in title:
        for co_name in co_names_df['Name']:
            if string == co_name:
                co_names_index = co_names_df.loc[co_names_df['Name']=='string'].index
                co_names_df['Frequency'][co_names_index] += 1

co_names_df sample co_names_df 示例

    Name    Frequency
0   3M  0
1   A.O. Smith  0
2   Abbott  0
3   AbbVie  0
4   Accenture   0
5   Activision  0
6   Acuity Brands   0
7   Adobe Systems   0                 
               ...     

sp500news['title'] sample sp500news['title'] 示例

title  
0       Italy will not dismantle Montis labour reform  minister                            
1       Exclusive US agency FinCEN rejected veterans in bid to hire lawyers                
4       Xis campaign to draw people back to graying rural China faces uphill battle        
6       Romney begins to win over conservatives                                            
8       Oregon mall shooting survivor in serious condition                                 
9       Polands PGNiG to sign another deal for LNG supplies from US CEO              

You can probably speed this up;您可能可以加快速度; you're using dataframes where other structures would work better.您正在使用其他结构可以更好地工作的数据帧。 Here's what I would try.这就是我要尝试的。

from collections import Counter

counts = Counter()

# checking membership in a set is very fast (O(1))
company_names = set(co_names_df["Name"])

for title in sp500news['title']:
    for word in title: # did you mean title.split(" ")? or is title a list of strings?
        if word in company_names:
            counts.update([word])

counts is then a dictionary {company_name: count} . counts然后是一个字典{company_name: count} You can just do a quick loop over the elements to update the counts in your dataframe.您只需对元素进行快速循环即可更新数据框中的计数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM