Search

Big Data Formats

Updated: Feb 21


Smart companies don't let novices play with their most critical resource, their data.


Different file formats and how to read them in Python?

Comma-separated values

XLSX

ZIP

Plain Text (txt)

JSON

XML

HTML

Images

Hierarchical Data Format

PDF

DOCX

MP3

MP4


What is a file format?


A file format is a standard way in which information is encoded for storage in a file. First, the file format specifies whether the file is a binary or ASCII file. Second, it shows how the information is organized. For example, comma-separated values (CSV) file format stores tabular data in plain text.


To identify a file format, you can usually look at the file extension to get an idea. For example, a file saved with name “Data” in “CSV” format will appear as “Data.csv”. By noticing “.csv” extension we can clearly identify that it is a “CSV” file and data is stored in a tabular format.

©2020 by Arturo Devesa.