简体   繁体   中英

How can I extract a substring between two characters for every row of a column in a CSV file and copy those values into a new column in Python?

I have a column with unique ID numbers, called "UnitID" , that is organised in a way such as this:

ABC2_DEFGH12-01_X1_Y1

The segment of DEFGH12-01 hypothetically refers to the ID of the specific batch of units. I need to make a new column that specifies this batch, and therefore, want to extract the "DEFGH12-01" values (like extracting the value between the first and second "_", but I haven't been able to figure out how), into a new column, called "BatchID" .

I would want to just leave "UnitID" as is, and simply add the new "BatchID" column before it.

I've tried everything but I haven't really managed to do this.

Using str.split("_").str[1]

Ex:

df = pd.DataFrame({"UnitID": ["ABC2_DEFGH12-01_X1_Y1"]})
df["BatchID"] = df["UnitID"].str.split("_").str[1]
print(df)

Output:

                  UnitID     BatchID
0  ABC2_DEFGH12-01_X1_Y1  DEFGH12-01

If you need Regex use str.extract(r"(?<=_)(.*?)(?=_)") .

df["BatchID"] = df["UnitID"].str.extract(r"(?<=_)(.*?)(?=_)")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM