简体   繁体   English

根据另一列的值替换 Pandas dataframe 中的特定值

[英]Replacing specific values in a Pandas dataframe basing on the values of another column

I have a DataFrame similar to this:我有一个类似于此的 DataFrame:

Chr  Start_Position End_Position Type
1    10000          10001        SNP
5    45321          45327        INS
12   44700          44710        DEL

I need to change the values of some cells depending on what Type is:我需要根据Type更改某些单元格的值:

  • SNP needs Start_Position + 1 SNP需要Start_Position + 1
  • INS needs End_Position + 1 INS需要End_Position + 1
  • DEL needs Start_Position + 1 DEL需要Start_Position + 1

My issue is that my current solutions are extremely verbose.我的问题是我目前的解决方案非常冗长。 What I've tried ( dataframe is the original data source):我试过的( dataframe是原始数据源):

snp_records = dataframe.loc[dataframe["Type"] == "SNP", :]
del_records = dataframe.loc[dataframe["Type"] == "DEL", :]
ins_records = dataframe.loc[dataframe["Type"] == "INS", :]

snp_records.loc[:, "Start_Position"] = snp_records["Start_Position"].add(1)
del_records.loc[:, "Start_Position"] = del_records["Start_Position"].add(1)
ins_records.loc[:, "End_Position"] = ins_records["End_Position"].add(1)

dataframe.loc[snp_records.index, "Start_Position"] = snp_records["Start_Position"]
dataframe.loc[del_records.index, "Start_Position"] = del_records["Start_Position"]
dataframe.loc[ins_records.index, "End_Position"] = ins_records["End_Position"]

As I have to do this for more columns than the example (similar concept, though) this becomes very long and verbose, and possibly error prone (in fact, I've made several mistakes just typing down the example) due to all the duplicated lines.因为我必须为比示例更多的列(尽管类似的概念)这样做,所以这变得非常冗长和冗长,并且可能容易出错(事实上,我在输入示例时犯了几个错误)由于所有重复线。

This question is similar to mine , but there the values are predefined, while I need to get them from the data themselves. 这个问题与我的类似,但是这些值是预定义的,而我需要从数据本身中获取它们。

You can just do:你可以这样做:

df.loc[df['Type'].isin(['SNP','INS']), 'Start_Position'] += 1
df.loc[df['Type'].eq('INS'), 'End_Position'] += 1

For general solution you can pass lists to Series.isin and pass to DataFrame.loc for set values by masks:对于一般解决方案,您可以将列表传递给Series.isin并传递给DataFrame.loc以通过掩码设置值:

start = ['SNP','DEL']
end = ['INS']

df.loc[df['Type'].isin(start), 'Start_Position'] += 1
df.loc[df['Type'].isin(end), 'End_Position'] += 1
print (df)
   Chr  Start_Position  End_Position Type
0    1           10001         10001  SNP
1    5           45321         45328  INS
2   12           44701         44710  DEL

Another ideas with pass both columns in one DataFrame.loc :在一个DataFrame.loc中传递两列的另一种想法:

m = pd.concat([df['Type'].isin(start), df['Type'].isin(end)], axis=1)
df[[ 'Start_Position', 'End_Position']] += m.to_numpy()
print (df)
   Chr  Start_Position  End_Position Type
0    1           10001         10001  SNP
1    5           45321         45328  INS
2   12           44701         44710  DEL

Or:或者:

m = np.vstack((df['Type'].isin(start), df['Type'].isin(end))).T
df[[ 'Start_Position', 'End_Position']] += m
print (df)
   Chr  Start_Position  End_Position Type
0    1           10001         10001  SNP
1    5           45321         45328  INS
2   12           44701         44710  DEL

Try with np.where尝试使用np.where

start = ['SNP','DEL']
end = ['INS']

df['Start_Position'] = np.where(df['Type'].isin(start),df['Start_Position']+1,df['Start_Position'])

df['End_Position'] = np.where(df['Type'].isin(end ),df['End_Position']+1,df['End_Position'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM