Hadoop Tutorial | Spark Kafka Nosql and BigData tools for DWH

Posts

Showing posts from 2018

Spark sql with JDBC

October 30, 2018

./sbin/start-master.sh you will get master is running on a hostname and a port number. copy that and put in blow command ./bin/spark-shell --driver-memory 4G --master spark://master-host:7077 --executor-memory 8G --executor-cores 1 --num-executors 3 import java.util.Properties val connectionProperties = new Properties() connectionProperties.put("user", "actualUsername") connectionProperties.put("password", "actualPassword") val jdbcUrl = "jdbc:mysql://hostname/dbname" val sqlquery = "(select * from t1 limit 10)tmp" val df = spark.read.jdbc(url=jdbcUrl, table=sqlquery, properties=connectionProperties) df.show for ms sql: https://docs.microsoft.com/en-us/sql/connect/jdbc/building-the-connection-url?view=sql-server-2017

Hbase Installation

August 21, 2018

Hbase installation on linux machine step by step Let's go for installtion of Hbase Nosql database on linux machine where hadoop cluster is already installed. step 1. Download hbase tar file hbase-0.94.2.tar.gz untar hbase-0.94.2.tar.gz. file tar -zxvf hbase-0.94.2.tar.gz. step 2. go inside hbase-0.94.2/conf/ and vi hbase-env.sh # The java implementation to use. Java 1.6 required. export JAVA_HOME=/home/dinesh1/jdk1.7.0_45 step 3. go inside hbase-0.94.2/conf/ and vi hbase-site.xml <configuration> <property>  <name>hbase.rootdir</name> <value>hdfs://192.168.5.134:54310/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> ...

What is Google Cloud Platform (GCP)

August 20, 2018

Google Cloud Platform (GCP) is a set of physical assets, such as computers and hard disk drives and virtual resources like virtual machines, that are contained in Google's data centers around the globe with 99.99% up-time. Each data center location is in a global region. Each region is a collection of zones, which are isolated from each other within the region. By using cloude VM's one can use the resource to scale the requirement and can close the resource when don't need on the fly, which also minimize the cost. Very secure cloud and easy to use with less pricing. This distribution of resources provides several benefits, including redundancy in case of failure and reduced latency by locating resources closer to clients. This distribution also introduces some rules about how resources can be used together. About the GCP Services This overview introduces some of the commonly used Google Cloud Platform (GCP) services. . Types of services: Computing and hosting...

Read JSON File in Cassandra

January 03, 2018

INSERT/Load Json data File in Cassandra (3.0+) table Requirement: Create Cassandra table where we can load json data into it. Here some of the column name is separated by space(like 'a b'). Load JSON file into the table. Challenge: Cassandra support only 'CSV file' load in Table(As per my understanding and searched till now) by using 'COPY' command but not 'JSON' file. Resolution: As per the Cassandra document page, Cassandra supports CSV record level insert and file level insert both operation but for JSON it only support Record level insert by following command: cqlsh> INSERT INTO keyspace1.table1 JSON '{ "id" : "12", "DB" : "Cassandra", "ops" : "Insert", "Project" : "Hadoop" }'; So If we want to insert whole file in the table then we need to loop for each object in JSON file and need to call insert query every tim...