Create scatter plot with data from multiple txt files

Question

I'm trying to create scatter plot from several txt files. All files have the same structure: two columns with data and 'comma' as a separator:
54.1,12
65.7,11
122.2,18
etc
For small number of files i have this code:

import numpy as np
import matplotlib.pyplot as plt
import csv

# Create data
g1=np.loadtxt('214.txt',delimiter=',', unpack=True)
g2=np.loadtxt('228.txt',delimiter=',', unpack=True)
g3=np.loadtxt('491.txt',delimiter=',', unpack=True)
g4=np.loadtxt('647.txt',delimiter=',', unpack=True)
data = (g1, g2, g3,g4)
colors = ("red", "green", "blue", "black")
groups = ("214", "228", "491", "647") 

# Create plot
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

for data, color, group in zip(data, colors, groups):
    y, x = data
    ax.scatter(x, y, alpha=0.8, c=color, edgecolors='none', s=30, label=group)

#Plot settings 
plt.title('Matplot scatter plot')
plt.legend(loc=4)
axes = plt.gca()
axes.set_xlim([2,30])
axes.set_ylim([0,3000])
plt.gca().invert_yaxis()
plt.show()

Please advise how to modify it to read multiple (up to 50 - 100) txt files in folder, if number of them is different every time ?

Answer 1

I would search for all files in your current directory and identify which you want to extract data from. This can be done with something like:

from os import listdir, path

files = [f for f in listdir('.') if path.isfile(f)]
file_names = [file for file in files if file.startswith('file_name_identifer')]

This will give you a list of file names which contain the data you're wanting to extract, you can then just load them one by one in a for loop. Using similar loading techniques to what you've used above:

data = []
for file in file_names:
    data.append(np.loadtxt('file', delimiter=',', unpack=True))

You could flatten this to a generator expression too:

data = [np.loadtxt('file', delimiter=',', unpack=True) for file in file_names]

If your files don't start with something which can be used to identify them, you can simply check some other way instead (change if file.startswith('file_name_indentifer') to something else which maybe checks if they're .txt files for instance: if file.endswith('.txt') ).

Answer 2

You can get a list of all files in directory using method described in this post

And then do something like this:

data = []
for file in filenames:
  data.append(np.loadtxt(file, delimiter=‘,’, unpack = True

#And do everything else you did with data

Though if your dataset is larger then available space in system memory I would consider adding datapoints to plot as you read the files

data = []
colors = [“red”,”green”,”blue”,”balck”]
for i, file in enumerate(filenames):
  data = np.loadtxt(file, delimiter=‘,’,unpack=True)
  group = file.split(‘.’)[0]
  color = colors[i%len(colors)]
  ax.scatter(data[0], data[1], alpha=0.8, c=color, edgecolors=‘none’, s=30, label=group)

PS quotes are typed wrong (both double and single ones) as I'm writing from a mobile device

Answer 3

Thanks for help. Here is what worked for me:

import numpy as np
import matplotlib.pyplot as plt
from os import listdir, path
import logging, sys
import random

data = []
#Get files with extension ".txt")
files = [f for f in listdir('.') if path.isfile(f)]
file_names = [file for file in files if file.endswith('.txt')]

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)

# Create plot
for file in file_names:
    data=np.loadtxt(file, delimiter=",", unpack = True)
    color = ["#"+''.join([random.choice('0123456789ABCDEF')for j in range(6)])]
    ax.scatter(data[1], data[0], alpha=0.8, c=color, edgecolors="none", s=30, label=file)

#Plot settings 
plt.title('Matplot scatter plot')
plt.legend(loc=4)
axes = plt.gca()
plt.gca().invert_yaxis()
plt.show()

Create scatter plot with data from multiple txt files

Question

3 answers

solution1
0 2019-01-24 14:54:58

solution2
0 2019-01-24 15:00:59

solution3
0 2019-01-27 09:09:18

Create scatter plot with data from multiple txt files

Question

3 answers

solution1 0 2019-01-24 14:54:58

solution2 0 2019-01-24 15:00:59

solution3 0 2019-01-27 09:09:18

solution1
0 2019-01-24 14:54:58

solution2
0 2019-01-24 15:00:59

solution3
0 2019-01-27 09:09:18