How to remove certain numeric values from a dataset in python?

Question

I've got a data set for which I am trying to extract gene names, however they have a numeric value infront of them (their ID) also in the row which I need to remove:

data = pd.read_csv("genes_person1.csv")

I read in my data which has this input:

Column 1
153 ADRB1
3486 IGFBP3
9531 BAG3
9612 NCOR2

I have been trying to get this output:

ADRB1
IGFBP3
BAG3
NCOR2

I've looked into answers from similar questions, like using slices, .replace, rstrip, but this either hasn't done anything or removes numbers which are a part of my gene name which I need to keep. How can I remove the numbers at the start of each row?

Answer 1

Use str.split

Ex:

import pandas as pd

df = pd.DataFrame({"Column 1": ["153 ADRB1", "3486 IGFBP3", "9531 BAG3", "9612 NCOR2"]})
print(df["Column 1"].str.split().str[1])

Output:

0     ADRB1
1    IGFBP3
2      BAG3
3     NCOR2
Name: Column 1, dtype: object

Answer 2

genename = "153 ADRB1"
print(genename.split(" ")[1])

Answer 3

您可以通过根据空间拆分csv文件并将数据集加载到列中，并获取第二列，如下所示：

datatemp = pd.read_csv("genes_person1.csv", sep=' ') data = datatemp.iloc[:, 1]

How to remove certain numeric values from a dataset in python?

Question

3 answers

solution1
3 ACCPTED 2019-03-18 11:33:18

solution2
2 2019-03-18 11:31:59

solution3
0 2019-03-18 11:45:38

How to remove certain numeric values from a dataset in python?

Question

3 answers

solution1 3 ACCPTED 2019-03-18 11:33:18

solution2 2 2019-03-18 11:31:59

solution3 0 2019-03-18 11:45:38

solution1
3 ACCPTED 2019-03-18 11:33:18

solution2
2 2019-03-18 11:31:59

solution3
0 2019-03-18 11:45:38