Sandipan GhoshData Streaming — AWS KinesisIn my current organisation, we had implemented Kafka from end to end, from acquiring resources, installing Kafka, developing the producer…Oct 16, 2021Oct 16, 2021
Sandipan GhoshCloud Computations — Quick data analysis with AWS Athena, Glue and Databricks sparkThroughout my carrier, I always had a situation that I had to fix failing production jobs. Most of the time, the debug involved analysis…Oct 3, 2021Oct 3, 2021
Sandipan GhoshSpark — Write single file per (hive) partitionsI have faced the problem of a small file in Hadoop too many times during the spark ETL process which is writing data to partitioned Hive…Jun 25, 2021Jun 25, 2021
Sandipan GhoshHow to Install Spark 3 on Windows 10I have been using spark for a long time. It is an excellent, distributed computation framework. I use this regularly at work, and I also…Mar 10, 2021Mar 10, 2021
Sandipan GhoshHow to download really big data sets for big data testingFor a long time, I have been working with big data technologies, like MapReduce, Spark, Hive, and very recently I have started working on…Aug 3, 2020Aug 3, 2020