简体   繁体   中英

How to remove certain numeric values from a dataset in python?

I've got a data set for which I am trying to extract gene names, however they have a numeric value infront of them (their ID) also in the row which I need to remove:

data = pd.read_csv("genes_person1.csv")

I read in my data which has this input:

Column 1
153 ADRB1
3486 IGFBP3
9531 BAG3
9612 NCOR2

I have been trying to get this output:

ADRB1
IGFBP3
BAG3
NCOR2

I've looked into answers from similar questions, like using slices, .replace, rstrip, but this either hasn't done anything or removes numbers which are a part of my gene name which I need to keep. How can I remove the numbers at the start of each row?

Use str.split

Ex:

import pandas as pd

df = pd.DataFrame({"Column 1": ["153 ADRB1", "3486 IGFBP3", "9531 BAG3", "9612 NCOR2"]})
print(df["Column 1"].str.split().str[1])

Output:

0     ADRB1
1    IGFBP3
2      BAG3
3     NCOR2
Name: Column 1, dtype: object
genename = "153 ADRB1"
print(genename.split(" ")[1])

您可以通过根据空间拆分csv文件并将数据集加载到列中,并获取第二列,如下所示:

datatemp = pd.read_csv("genes_person1.csv", sep=' ') data = datatemp.iloc[:, 1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM