简体   繁体   中英

Incorrect ndarray being written to csv

I'm trying to generate an audio dataset for a project. For this, I'm looping through my audio files (45 sec mp3 clips) using Librosa and writing 3 pieces of data to csv. One is a label for each clip (a string), second - the audio as a floating point time series and third is the sampling rate. To do this, I'm creating a dictionary of the 3 and writing to csv. The floating point time series is an ndarray. When I print length, it returns a value 992250. When it writes to file, it writes 7 values (the first 3 and the last 3 with a '...' element in the middle. Verified this when I read the file in another function loading into a dataframe. Could I get help solving this? Thank you.

I should add that I first tried to create a data frame and used df.to_csv() before this version. Neither works, they both have the same issue. I also looked up other options online, and it looks like a Numpy array can directly write to csv? But I also need each row to have the label ('ragam' below) and the 'sr'.

with open('audio_data.csv', 'w') as f:
  writer = csv.DictWriter(f, fieldnames=headers)
  writer.writeheader()

  for i, file in enumerate(flst):
    if file.endswith(".mp3"):
        audio, sr = librosa.core.load(os.getcwd() + folderpath + "/" + file)
        print(type(audio))
        print(str(len(audio)))
        ragam = file.split(sep='-')[0]
        elem = {
            'ragam': ragam,
            'audio': audio,
            'sr': sr
        }

        writer.writerow(elem)
        print("Completed: " + str(i + 1) + " of " + str(total) + " ...")

I settled on appending incrementally to the csv because it is a large dataset and I would like to try retaining any progress made in case something fails midway. Here's an example of the output.

Loading audio data ...
/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
<class 'numpy.ndarray'>
992250
Completed: 1 of 5 ...

来自代码的 csv 输出

The method writerow converts every complex object in the dictionary elem in its string representation, that's the reason why you obtain this unexpected output.

A simple workaround is to convert the np.array to a list , ie using

elem = {
    'ragam': ragam,
    'audio': audio.tolist(),
    'sr': sr
}

With this correction the output seems good, but the list is saved as a string and so it is not easy to recover the initial array reading the output csv.

An alternative approach is to save everything as a JSON, because JSON supports lists and dictionaries natively.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM