Sandipan GhoshData Streaming — AWS KinesisIn my current organisation, we had implemented Kafka from end to end, from acquiring resources, installing Kafka, developing the producer…14 min read·Oct 16, 2021----
Sandipan GhoshCloud Computations — Quick data analysis with AWS Athena, Glue and Databricks sparkThroughout my carrier, I always had a situation that I had to fix failing production jobs. Most of the time, the debug involved analysis…6 min read·Oct 3, 2021----
Sandipan GhoshSpark — Write single file per (hive) partitionsI have faced the problem of a small file in Hadoop too many times during the spark ETL process which is writing data to partitioned Hive…6 min read·Jun 25, 2021----
Sandipan GhoshHow to Install Spark 3 on Windows 10I have been using spark for a long time. It is an excellent, distributed computation framework. I use this regularly at work, and I also…4 min read·Mar 10, 2021----
Sandipan GhoshHow to download really big data sets for big data testingFor a long time, I have been working with big data technologies, like MapReduce, Spark, Hive, and very recently I have started working on…3 min read·Aug 3, 2020----