Code-Searching: April 2021

Friday, 9 April 2021

APACHE SPARK

General engine which can combine difference type of computations (SQL queries, text processing & ML)

Main factor : Speed

Spark :

Integrates closely with other Bigdata tools (can run in Hadoop clusters & access any Hadoop data source + Cassandra)
offers simple APIs in Python, Java, Scala, R & SQL and built-in libs
allows querying data via SQL and HQL (Hive QL)
supports many data sources : Hive tables, Parquet & formats : JSON, CSV, txt etc.

Spark Architecture

Hadoop vs. Spark