简体   繁体   中英

Drop/edit rows in dataframe where entry doesn't meet condition

I know this has been asked before but I cannot find an answer that is working for me. I have a dataframe df that contains a column age , but the values are not all integers, some are strings like 35-59 . I want to drop those entries. I have tried these two solutions as suggested by kite but they both give me AttributeError: 'Series' object has no attribute 'isnumeric'

df.drop(df[df.age.isnumeric()].index, inplace=True)
df = df.query("age.isnumeric()")
df = df.reset_index(drop=True)

Additionally is there a simple way to edit the value of an entry if it matches a certain condition? For example instead of deleting rows that have age as a range of values, I could replace it with a random value within that range.

Try with:

df.drop(df[df.age.str.isnumeric() == False].index, inplace=True)

If you check documentation isnumeric is a method of Series.str and not of Series. That's why you get that error.

Also you will need the ==False because you have mixed types and get a series with only booleans.

I'm posting it in case this also helps you with your last question. You can use pandas.DataFrame.at with pandas.DataFrame.Itertuples for iteration over rows of the dataframe and replace values:

for row in df.itertuples():
  # iterate every row and change the value of that column
  if row.age == 'non_desirable_value:
    df.at[row.Index, "age"] = 'desirable_value'

Hence, it could be:

for row in df.itertuples():
  if row.age.str.isnumeric() == False or row.age == 'non_desirable_value':
    df.at[row.Index, "age"] = 'desirable_value'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM