[英]When using a pandas dataframe, how do I add column if does not exist?
I'm new to using pandas and am writing a script where I read in a dataframe and then do some computation on some of the columns.我是使用 pandas 的新手,并且正在编写一个脚本,我在其中读取 dataframe 然后对某些列进行一些计算。
Sometimes I will have the column called "Met":有时我会有一个名为“Met”的专栏:
df = pd.read_csv(File,
sep='\t',
compression='gzip',
header=0,
names=["Chrom", "Site", "coverage", "Met"]
)
Other times I will have:其他时候我会有:
df = pd.read_csv(File,
sep='\t',
compression='gzip',
header=0,
names=["Chrom", "Site", "coverage", "freqC"]
)
I need to do some computation with the "Met" column so if it isn't present I will need to calculate it using:我需要对“Met”列进行一些计算,所以如果它不存在,我需要使用以下方法计算它:
df['Met'] = df['freqC'] * df['coverage']
is there a way to check if the "Met" column is present in the dataframe, and if not add it?有没有办法检查 dataframe 中是否存在“Met”列,如果不添加?
You check it like this:你像这样检查它:
if 'Met' not in df:
df['Met'] = df['freqC'] * df['coverage']
When interested in conditionally adding columns in a method chain , consider using pipe()
with a lambda
:如果有兴趣在方法链中有条件地添加列,请考虑将pipe()
与lambda
一起使用:
df.pipe(lambda d: (
d.assign(Met=d['freqC'] * d['coverage'])
if 'Met' not in d else d
))
If you were creating the dataframe from scratch, you could create the missing columns without a loop merely by passing the column names into the pd.DataFrame()
call:如果您从头开始创建 dataframe,则只需将列名传递给pd.DataFrame()
调用即可创建没有循环的缺失列:
cols = ['column 1','column 2','column 3','column 4','column 5']
df = pd.DataFrame(list_or_dict, index=['a',], columns=cols)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.