How can I extract a substring between two characters for every row of a column in a CSV file and copy those values into a new column in Python?

Question

I have a column with unique ID numbers, called "UnitID" , that is organised in a way such as this:

ABC2_DEFGH12-01_X1_Y1

The segment of DEFGH12-01 hypothetically refers to the ID of the specific batch of units. I need to make a new column that specifies this batch, and therefore, want to extract the "DEFGH12-01" values (like extracting the value between the first and second "_", but I haven't been able to figure out how), into a new column, called "BatchID" .

I would want to just leave "UnitID" as is, and simply add the new "BatchID" column before it.

I've tried everything but I haven't really managed to do this.

Answer 1

Using str.split("_").str[1]

Ex:

df = pd.DataFrame({"UnitID": ["ABC2_DEFGH12-01_X1_Y1"]})
df["BatchID"] = df["UnitID"].str.split("_").str[1]
print(df)

Output:

                  UnitID     BatchID
0  ABC2_DEFGH12-01_X1_Y1  DEFGH12-01

If you need Regex use str.extract(r"(?<=_)(.*?)(?=_)") .

df["BatchID"] = df["UnitID"].str.extract(r"(?<=_)(.*?)(?=_)")

How can I extract a substring between two characters for every row of a column in a CSV file and copy those values into a new column in Python?

Question

1 answers

solution1
0 2019-07-16 12:47:16

How can I extract a substring between two characters for every row of a column in a CSV file and copy those values into a new column in Python?

Question

1 answers

solution1 0 2019-07-16 12:47:16

solution1
0 2019-07-16 12:47:16