简体   繁体   中英

How to split one column into multiple with uneven delimiters

I have a column in a dataset of school, city, state. All of them separated by commas. I'm trying to separate them into three new columns for analysis. each new column named school, city, state.

The original dataset. They won't let me put pictures in here yet so here's a link.

The problem I'm running into is that the commas are used sometimes in the school name and that throws things off and forces the creation of a new column and just generally doesn't work.

This is the code I used to split the column:

undergrad_colleges_supplying_50_med_students_test = undergrad_colleges_supplying_50_med_students.join(undergrad_colleges_supplying_50_med_students['undergraduate_institution'].str.split(',', expand=True).fillna(np.nan))

This is what that outputs. It gives me multiple new columns I'm not sure why and it also is separating by the comma present in some of the school names.

Processed Dataframe image. There should only be three columns but I ended up with 6 and the states are cities aren't lining up

I hope I explained this clearly. Any help is greatly appreciated!

If you know there will always be three items, you can set the n arg.

undergrad_colleges_supplying_50_med_students.join(
    undergrad_colleges_supplying_50_med_students['undergraduate_institution']
    .str.split(',', n=3, expand=True)
    .fillna(np.nan)
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM