简体   繁体   中英

Plotting data from large file in Python

I have a large file (2 GB). I want to plot the data in it in scatterplot . The data has following format in the file.

day block1  block2  block3  .....
1   34.89   88.90   67.89   .....
2   77.890  33.56   76.98   .....
3   67.12   67.89   55.89   .....
... .....   .....   .....   .....

pltData will be the list of average of the column that is

pltData = [avg_block1, avg_block2, avg_block3, .....]

pltX and pltY are finite lists. For plotting the data from list named as pltData , I'm using following code:

from matplotlib.colors import ListedColormap, LinearSegmentedColormap
import matplotlib.pyplot as plt
import matplotlib.cm as cm

FIGURE = plt.Figure()
SUBPLOT1 = FIGURE.add_subplot(121)
SUBPLOT1.set_xlabel('x distance')
SUBPLOT1.set_ylabel('y distance')
data1 = {'x-distance' : pltX , 'y_distance' : pltY}
df3 = DataFrame(data1, columns=['x-distance','y_distance'])
plot1= SUBPLOT1.scatter(df3['x-distance'],df3['y_distance'], marker='o', s=15, linewidths=0.1, c=pltData, cmap='rainbow', vmin=min(pltData), vmax=max(pltData))
FIGURE.colorbar(plot1, ax=SUBPLOT1)

However, for file as large as 2GB, creating list pltData is impossible because number of rows and columns are large. Can someone guide me to the way I can plot the data?

You can use pandas package to load your data then from there do a mean to get average for your columns. Something like this,

import pandas as pd

pltData = pd.read_csv('my2gbdata.csv')  
pltData = pltData.mean()

This will gives you the average for each column, from there you can use it however you need to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM