I have a large file (2 GB). I want to plot the data in it in scatterplot
. The data has following format in the file.
day block1 block2 block3 .....
1 34.89 88.90 67.89 .....
2 77.890 33.56 76.98 .....
3 67.12 67.89 55.89 .....
... ..... ..... ..... .....
pltData will be the list of average of the column that is
pltData = [avg_block1, avg_block2, avg_block3, .....]
pltX and pltY are finite lists. For plotting the data from list named as pltData , I'm using following code:
from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import matplotlib.pyplot as plt
import matplotlib.cm as cm
FIGURE = plt.Figure()
SUBPLOT1 = FIGURE.add_subplot(121)
SUBPLOT1.set_xlabel('x distance')
SUBPLOT1.set_ylabel('y distance')
data1 = {'x-distance' : pltX , 'y_distance' : pltY}
df3 = DataFrame(data1, columns=['x-distance','y_distance'])
plot1= SUBPLOT1.scatter(df3['x-distance'],df3['y_distance'], marker='o', s=15, linewidths=0.1, c=pltData, cmap='rainbow', vmin=min(pltData), vmax=max(pltData))
FIGURE.colorbar(plot1, ax=SUBPLOT1)
However, for file as large as 2GB, creating list pltData is impossible because number of rows and columns are large. Can someone guide me to the way I can plot the data?
You can use pandas package to load your data then from there do a mean to get average for your columns. Something like this,
import pandas as pd
pltData = pd.read_csv('my2gbdata.csv')
pltData = pltData.mean()
This will gives you the average for each column, from there you can use it however you need to.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.