I'm new to using pandas and am writing a script where I read in a dataframe and then do some computation on some of the columns.
Sometimes I will have the column called "Met":
df = pd.read_csv(File,
sep='\t',
compression='gzip',
header=0,
names=["Chrom", "Site", "coverage", "Met"]
)
Other times I will have:
df = pd.read_csv(File,
sep='\t',
compression='gzip',
header=0,
names=["Chrom", "Site", "coverage", "freqC"]
)
I need to do some computation with the "Met" column so if it isn't present I will need to calculate it using:
df['Met'] = df['freqC'] * df['coverage']
is there a way to check if the "Met" column is present in the dataframe, and if not add it?
You check it like this:
if 'Met' not in df:
df['Met'] = df['freqC'] * df['coverage']
When interested in conditionally adding columns in a method chain , consider using pipe()
with a lambda
:
df.pipe(lambda d: (
d.assign(Met=d['freqC'] * d['coverage'])
if 'Met' not in d else d
))
If you were creating the dataframe from scratch, you could create the missing columns without a loop merely by passing the column names into the pd.DataFrame()
call:
cols = ['column 1','column 2','column 3','column 4','column 5']
df = pd.DataFrame(list_or_dict, index=['a',], columns=cols)
Alternatively you can use get
:
df['Met'] = df.get('Met', df['freqC'] * df['coverage'])
If the column Met
exists, the values inside this column are taken. Otherwise freqC
and coverage
are multiplied.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.