简体   繁体   中英

Pandas, most efficient way to apply a two functions on entire row

I have the following DataFrame:

       Date  Label                                               Top1  \
0  2008-08-08      0  b"Georgia 'downs two Russian warplanes' as cou...   
1  2008-08-11      1  b'Why wont America and Nato help us? If they w...   
2  2008-08-12      0  b'Remember that adorable 9-year-old who sang a...   
3  2008-08-13      0  b' U.S. refuses Israel weapons to attack Iran:...   
4  2008-08-14      1  b'All the experts admit that we should legalis...   
                                                Top2  \
0            b'BREAKING: Musharraf to be impeached.'   
1        b'Bush puts foot down on Georgian conflict'   
2                 b"Russia 'ends Georgia operation'"   
3  b"When the president ordered to attack Tskhinv...   
4  b'War in South Osetia - 89 pictures made by a ...   
                                                Top3  \
0  b'Russia Today: Columns of troops roll into So...   
1  b"Jewish Georgian minister: Thanks to Israeli ...   
2  b'"If we had no sexual harassment we would hav...   
3  b' Israel clears troops who killed Reuters cam...   
4  b'Swedish wrestler Ara Abrahamian throws away ...   
                                                Top4  \
0  b'Russian tanks are moving towards the capital...   
1  b'Georgian army flees in disarray as Russians ...   
2  b"Al-Qa'eda is losing support in Iraq because ...   
3  b'Britain\'s policy of being tough on drugs is...   
4  b'Russia exaggerated the death toll in South O...   
                                                Top5  \
0  b"Afghan children raped with 'impunity,' U.N. ...   
1      b"Olympic opening ceremony fireworks 'faked'"   
2  b'Ceasefire in Georgia: Putin Outmaneuvers the...   
3  b'Body of 14 year old found in trunk; Latest (...   
4  b'Missile That Killed 9 Inside Pakistan May Ha...  
                                               Top25  VIX Open  VIX High  \
0           b"No Help for Mexico's Kidnapping Surge"     21.15     21.69   
1  b"So this is what it's come to: trading sex fo...     20.66     20.96   
2  b"BBC NEWS | Asia-Pacific | Extinction 'by man...     20.64     21.51   
3  b'2006: Nobel laureate Aleksander Solzhenitsyn...     21.57     22.11   
4  b'Philippines : Peace Advocate say Muslims nee...     22.30     22.30  

the top 1 to top 25 are news articles I want to perform sentiment analysis on every date's articles and create a mean of those scores so is there a way I can efficiently check if a column contains the word Top calculate the score and create a column of mean for every date?

What I tried so far:


def scorer(row, col):
    date_scores = []
    if col.contains('Top'):
        date_scores.append(get_sentiment_score(row[col]))
    else:
        pass
    sentiment_daily_mean = np.mean()
    return sentiment_daily_mean
df['date_score'] = df.apply(lambda x: scorer(x), args=list(df.columns))

but this won't work since I'm passing all the columns to the function at once

You need to pass in the rows to the apply-function. Try this:

def scorer(row):
  date_scores = []
  for col in row:
    if 'Top' in col:
      date_scores.append(get_sentiment_score(row[col]))
  sentiment_daily_mean = date_scores.mean()
  return sentiment_daily_mean
    
df['date_score'] = df.apply(scorer, axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM