简体   繁体   中英

Importing a CSV and XLSX file

I would like to know why it took longer to import a data set. I'm using pandas and I have same file one version on XLSX and the other on CSV. How come the CSV files is faster to upload?

See picture

In general, CSV files are much less complicated than .xlsx files. csv is "raw data", while xlsx also stores information about formatting, font, color, and other cell formatting configurations. SO I'm no expert but for sure csv files would be lighter and also faster to read

CSV is an acronym for "comma separated values." A CSV is literally lines of values separated by a delimiter such as a comma, tab, or semicolon.

person,age,fav_animal
bob,20,cat
mary,16,duck

XLSX is a complicated binary format with a specification that is over 1000 pages long. Parsers have to validate the format and extract important objects.

Parsing CSVs is faster than reading XLSX partially because the format is rudimentary, but that isn't the only reason. Binary formats designed for data in general or even specific classes of data, such as HDF or Parquet , are even faster to parse as well as more space efficient than CSV. XLSX is designed for spreadsheets and the requisite complexity of them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM