I've got a data set for which I am trying to extract gene names, however they have a numeric value infront of them (their ID) also in the row which I need to remove:
data = pd.read_csv("genes_person1.csv")
I read in my data which has this input:
Column 1
153 ADRB1
3486 IGFBP3
9531 BAG3
9612 NCOR2
I have been trying to get this output:
ADRB1
IGFBP3
BAG3
NCOR2
I've looked into answers from similar questions, like using slices, .replace, rstrip, but this either hasn't done anything or removes numbers which are a part of my gene name which I need to keep. How can I remove the numbers at the start of each row?
Use str.split
Ex:
import pandas as pd
df = pd.DataFrame({"Column 1": ["153 ADRB1", "3486 IGFBP3", "9531 BAG3", "9612 NCOR2"]})
print(df["Column 1"].str.split().str[1])
Output:
0 ADRB1
1 IGFBP3
2 BAG3
3 NCOR2
Name: Column 1, dtype: object
genename = "153 ADRB1"
print(genename.split(" ")[1])
您可以通过根据空间拆分csv文件并将数据集加载到列中,并获取第二列,如下所示:
datatemp = pd.read_csv("genes_person1.csv", sep=' ') data = datatemp.iloc[:, 1]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.