I have a DataFrame similar to this:
Chr Start_Position End_Position Type
1 10000 10001 SNP
5 45321 45327 INS
12 44700 44710 DEL
I need to change the values of some cells depending on what Type
is:
SNP
needs Start_Position
+ 1 INS
needs End_Position
+ 1 DEL
needs Start_Position
+ 1 My issue is that my current solutions are extremely verbose. What I've tried ( dataframe
is the original data source):
snp_records = dataframe.loc[dataframe["Type"] == "SNP", :]
del_records = dataframe.loc[dataframe["Type"] == "DEL", :]
ins_records = dataframe.loc[dataframe["Type"] == "INS", :]
snp_records.loc[:, "Start_Position"] = snp_records["Start_Position"].add(1)
del_records.loc[:, "Start_Position"] = del_records["Start_Position"].add(1)
ins_records.loc[:, "End_Position"] = ins_records["End_Position"].add(1)
dataframe.loc[snp_records.index, "Start_Position"] = snp_records["Start_Position"]
dataframe.loc[del_records.index, "Start_Position"] = del_records["Start_Position"]
dataframe.loc[ins_records.index, "End_Position"] = ins_records["End_Position"]
As I have to do this for more columns than the example (similar concept, though) this becomes very long and verbose, and possibly error prone (in fact, I've made several mistakes just typing down the example) due to all the duplicated lines.
This question is similar to mine , but there the values are predefined, while I need to get them from the data themselves.
You can just do:
df.loc[df['Type'].isin(['SNP','INS']), 'Start_Position'] += 1
df.loc[df['Type'].eq('INS'), 'End_Position'] += 1
For general solution you can pass lists to Series.isin
and pass to DataFrame.loc
for set values by masks:
start = ['SNP','DEL']
end = ['INS']
df.loc[df['Type'].isin(start), 'Start_Position'] += 1
df.loc[df['Type'].isin(end), 'End_Position'] += 1
print (df)
Chr Start_Position End_Position Type
0 1 10001 10001 SNP
1 5 45321 45328 INS
2 12 44701 44710 DEL
Another ideas with pass both columns in one DataFrame.loc
:
m = pd.concat([df['Type'].isin(start), df['Type'].isin(end)], axis=1)
df[[ 'Start_Position', 'End_Position']] += m.to_numpy()
print (df)
Chr Start_Position End_Position Type
0 1 10001 10001 SNP
1 5 45321 45328 INS
2 12 44701 44710 DEL
Or:
m = np.vstack((df['Type'].isin(start), df['Type'].isin(end))).T
df[[ 'Start_Position', 'End_Position']] += m
print (df)
Chr Start_Position End_Position Type
0 1 10001 10001 SNP
1 5 45321 45328 INS
2 12 44701 44710 DEL
Try with np.where
start = ['SNP','DEL']
end = ['INS']
df['Start_Position'] = np.where(df['Type'].isin(start),df['Start_Position']+1,df['Start_Position'])
df['End_Position'] = np.where(df['Type'].isin(end ),df['End_Position']+1,df['End_Position'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.