简体   繁体   中英

Python Pandas replace string based on format

Please, is there any ways to replace "xy" by "x,x+1,x+2,...,y" in every row in a data frame? (Where x, y are integer). For example, I want to replace every row like this:

"1-3,7" by "1,2,3,7"
"1,4,6-9,11-13,5" by "1,4,6,7,8,9,11,12,13,5" etc

I know that by looping through lines and using regular expression we can do that. But the table is quite big and it takes quite some time. so I think using pandas might be faster.

Thanks alot

In pandas you can use apply to apply any function to either rows or columns in a DataFrame. The function can be passed with a lambda, or defined separately.

(side-remark: your example does not entirely make clear if you actually have a 2-D DataFrame or just a 1-D Series. Either way, apply can be used)

The next step is to find the right function. Here's a rough version (without regular expressions):

def make_list(str):
    lst = str.split(',')
    newlst = []
    for i in lst:
        if "-" in i:
            newlst.extend(range(*[int(j) for j in i.split("-")]))
        else:
            newlst.append(int(i))
    return newlst

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM