简体   繁体   中英

Removing a lot of rows from a dataframe in Python

I am trying to remove non-English tweets from a large dataset in the most efficient way possible. I have tried to create a list of rows that are not English and them removing them, but removing each tweet takes a long time (the langid.classify() function is not the problem).

def removeLanguage(df):
  rowsToDelete = []
  text = df['tweet'][i]
  try:
    if (langid.classify(text)[0] != 'en' ):
      rowsToDelete.append(i)

      continue
  except ValueError:
    rowsToDelete.append(i)
    continue
   
  for i in rowsToDelete:
    df.drop(i, inplace=True)

newDf = beforeClassification(inputDf).reset_index(drop=True)

Is there a more efficient way to remove a set of rows from a DataFrame than df.drop() ?

df.drop非常有效,但我也会使用类似的东西

df = df[langid.classify(df.tweet)[0] != 'en' ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM