简体   繁体   中英

How to split string based on the length of each value separated by commas in Pandas data frame?

I have a string like "H33, H431, H450", want to split them into 2 strings based on the length of each value separated by comma. For example, in this example, the length of each is 3, 4, 4 and I hope to get 2 strings "H33" and "H431, H450".

This is originally a dataframe with the name icd with a column which is the 4th one looks like this. It contains strings of codes with certain digit length. My goal is to split this column into 2 column that contains the codes with certain digit length. So I've tried to use a for loop to do this but it did't give what i need. I'm not sure if this is the best way. Think apply() might be better but not sure how to do it.

for i in icd.itertuples():
    for substr in i[4].split(','):
        if len(substr.strip()) == 3:
            print(substr.strip())
        if len(substr.strip()) == 4:
            print(substr.strip())
def split(x, length):

    splitted_string = x.split(',')
    result = [substr.strip() for substr in splitted_string if len(substr.strip())==length]
    result = ', '.join(result)
    return result


df = pd.DataFrame({'a':["H33, H431, H450", "H21, H11, H521"]})

for length in [3, 4]:
    df[length] = df['a'].apply(lambda x: split(x, length))
>>> df.drop(['a'], axis=1)
          3           4
0       H33  H431, H450
1  H21, H11        H521

I'm not sure if there's a more elegant way to do the task without using the for loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM