简体   繁体   中英

Get the same name of a csv file as the name of my pandas dataframe object

I have a folder with several csv file and also compressed file in gz format type. Each of these unzipped gz file also contain one csv file. I want to extract all of them and create a dataframe for each one with same name as the csv file name (without the extension).

For example, if have the following files:

train.csv
test.csv
validation.csv.gz

I want to have 3 dataframes objects whose names are exactly : train, test and validation.

I've tried this code :

import pandas as pd
import gzip

extension = ".gz"

for item in os.listdir():
    if item.endswith(extension):
        with gzip.open(item) as f:
            item.split('.', 1)[0] = pd.read_csv(f) #Split on the first occurence of '.' and give this name to my dataframe
    else:
        item.split('.', 1)[0] = pd.read_csv(item)

This code doesn't work because when I try to access my environment variables, python couldn't find them.

Any help, please !!

Use a dictionary for a variable number of variables.

While it's possible to name variables via strings, it is strongly discouraged. A dictionary is performant and allows you to maintain a collection of objects in a structured way.

d = {}

for item in os.listdir():
    fn, ext = item.split('.')
    if ext == 'gz':
        with gzip.open(item) as f:
            d[fn] = pd.read_csv(f)
    else:
        d[fn] = pd.read_csv(item)

Then access via d['train'] , d['test'] , etc.

Your code does not work because item.split('.', 1)[0] is a scalar, not a variable name to which you can assign an object.

Strings are immutable. If you want to dynamically assign an object to a given string, just make use of exec .

This statement supports dynamic execution of Python code. The first expression should evaluate to either a string, an open file object, or a code object.

import pandas as pd
import gzip

extension = ".gz"

for item in os.listdir():
    if item.endswith(extension):
        with gzip.open(item) as f:
            exec(item.split('.', 1)[0] + "=" + "pd.read_csv(f)" ) #Split on the first occurence of '.' and give this name to my dataframe
    else:
        exec(item.split('.', 1)[0] + "=" + "pd.read_csv('" + item + "')")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM