I'm trying to filter a python data frame based on a sub string in one of the columns.
If the number at position 13&14 of the ID field is <=9, I want to keep the row, if it's > 9, I want to drop the row.
Example:
ABCD-3Z-A93Z-01A-11R-A37O-07 -> keep
ABCD-3Z-A93Z-11A-11R-A37O-07 -> drop
I've managed to get to the below solution, but I think there must be a quicker, more efficient way.
import pandas as pd
# Enter some data. We want to filter out all rows where the number at pos 13,14 > 9
df = {'ID': ['ABCD-3Z-A93Z-01A-11R-A37O-07', 'ABCD-6D-AA2E-11A-11R-A37O-07', 'ABCD-6D-AA2E-01A-11R-A37O-07',
'ABCD-A3-3307-01A-01R-0864-07', 'ABCD-6D-AA2E-01A-11R-A37O-07', 'ABCD-6D-AA2E-10A-11R-A37O-07',
'ABCD-6D-AA2E-09A-11R-A37O-07'],
'year': [2012, 2012, 2013, 2014, 2014, 2017, 2015]
}
# convert to df
df = pd.DataFrame(df)
# define a function that checks if position 13&15 are > 9.
def filter(x):
# that, if x is a string,
if type(x) is str:
if int(float(x[13:15])) <= 9:
return True
else:
return False
else:
return False
# apply function
df['KeepRow'] = df['ID'].apply(filter)
print(df)
# Now filter out rows where "KeepRow" = False
df = df.loc[df['KeepRow'] == True]
print(df)
# drop the column "KeepRow" as we don't need it anymore
df = df.drop('KeepRow', axis=1)
print(df)
I think you can just filter based in 13th symbol of your string:
import pandas as pd
# Enter some data. We want to filter out all rows where the number at pos 13,14 > 9
df = pd.DataFrame({
'ID': ['ABCD-3Z-A93Z-01A-11R-A37O-07',
'ABCD-6D-AA2E-11A-11R-A37O-07',
'ABCD-6D-AA2E-01A-11R-A37O-07',
'ABCD-A3-3307-01A-01R-0864-07',
'ABCD-6D-AA2E-01A-11R-A37O-07',
'ABCD-6D-AA2E-10A-11R-A37O-07',
'ABCD-6D-AA2E-09A-11R-A37O-07'],
'year': [2012, 2012, 2013, 2014, 2014, 2017, 2015]
})
# convert to df
df['KeepRow'] = df['ID'].apply(lambda x: x[13] == '0')
or simply:
df[df['ID'].apply(lambda x: x[13] == '0')]
Use indexing with str for values by positions, then convert to float
and filter by boolean indexing
:
df = df[df['ID'].str[13:15].astype(float) <=9]
print(df)
ID year
0 ABCD-3Z-A93Z-01A-11R-A37O-07 2012
2 ABCD-6D-AA2E-01A-11R-A37O-07 2013
3 ABCD-A3-3307-01A-01R-0864-07 2014
4 ABCD-6D-AA2E-01A-11R-A37O-07 2014
6 ABCD-6D-AA2E-09A-11R-A37O-07 2015
Detail:
print(df['ID'].str[13:15])
0 01
1 11
2 01
3 01
4 01
5 10
6 09
Name: ID, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.