简体   繁体   中英

Is there a "cleaner" way to write this code?

so just messing around with Pandas for the first time - curious, specifically with the variables in my code - does it make sense to keep iterating with "df#" or should I just keep rewriting "df"? Or if there's a more elegant way that I'm missing.

def func(csvfile):
    df = pd.read_csv(csvfile)
    df.columns = df.columns.str.replace(" ", "_")
    df2 = df.assign(column3=df.column3.str.split(",")).explode(
        "column3"
    )
    df3 = df2.assign(column2=df.column2.str.split("; ")).explode("column2")
    df3["column2"] = df3["column2"].str.replace(r"\(\d+\)", "", regex=True)
    df4 = df3[df3["column2"].str.contains("value2") == False]
    print(df4)

Taking a complete shot in the dark since you're unable to provide anything to work with, but I'd bet that this does the same:

def func(csvfile):
    df = pd.read_csv(csvfile)
    df.columns = df.columns.str.replace(" ", "_")
    df.column2 = df.column2.str.split("; ")
    df.column3 = df.column3.str.split(",")
    df = df.explode(['column2', 'column3']) # Or maybe explode them one at a time? I have no idea what you're doing.
    df.column2 = df.column2.str.replace(r"\(\d+\)", "", regex=True)
    df = df[~df.column2.str.contains("value2")]
    return df

df = func(csvfile)
print(df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM