Monday, 8 December 2014

Firing up a Spark node on my Cassandra Dev Cluster


From the previous post, Ive now got 2 data nodes on my local datacenter. I need to fire up a spark node in its on Virtual DC. 

I'm following the DataStax guide from here


Configure these:
  1. mkdirs for spark data in the install dir:
    alteredcarbon:spark3 neil$ more mkDataDir.sh
    #!/bin/bash
    mkdir cassandra-data; cd cassandra-data
    mkdir data saved_caches commit log
    mkdir spark
    mkdir spark/rdd spark/tmp

  2. Edit resources/spark/conf/sparkenv.sh data dirs:
    export SPARK_TMP_DIR="/Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data/spark/tmp"

    # Directory where RDDs will be cached
    export SPARK_RDD_DIR="/Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data/spark/rdd"

    # The directory for storing master.log and worker.log files
    export SPARK_LOG_DIR="/Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data/spark"


    export SPARK_WORKER_DIR="/Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data/spark/work"
  3. Spark uses a local Cassandra node, so lets configure that
  4. Allocate the NIC: sudo ifconfig lo0 alias 127.0.0.4 up
  5. Configure JMX resources/cassandra/conf/cassandra env.sh
    JMX_PORT="7188"

  6. Cassandra Endpoints: resources/cassansdra/conf/cassandra.yaml
    listen_address: 127.0.0.4
    # that rely on node auto-discovery.
    rpc_address: 127.0.0.4

  7. Configure Logging:vi resources/cassandra/conf/log4j-server.properties 
  8. Configure /etc/hosts:
    127.0.0.4       localhost4 alteredcarbon4 alteredcarbon4.local
  9. Configure Cassandra data dirs (remember, spark runs on Cassandra nodes)
    /Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data
    $ vi ../resources/cassandra/conf/cassandra.yaml
    # the configured compaction strategy.
    data_file_directories:
        - /Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data/data
    # commit log
    commitlog_directory: /Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data/commitlog
    # saved caches
    saved_caches_directory: /Volumes/BACKUP/DEV/TEMP/spark3/cassandra-data/saved_caches


  10. Fire up Spark with a Cassandra node $ bin/dse cassandra -f -k
  11. If all is good then your should see: INFO 15:00:04,865 SparkWorker: Starting remoting
     INFO 15:00:05,073 SparkWorker: Remoting started; listening on addresses :[akka.tcp://sparkWorker@127.0.0.4:54942]
     INFO 15:00:05,077 SparkWorker: Remoting now listens on addresses: [akka.tcp://sparkWorker@127.0.0.4:54942]
     INFO 15:00:05,379 SparkWorker: Starting Spark worker 127.0.0.4:54942 with 6 cores, 9.5 GB RAM
     INFO 15:00:05,380 SparkWorker: Spark home: /Volumes/BACKUP/DEV/TEMP/spark3/resources/spark
     INFO 15:00:05,643 SparkWorker: Started Worker web UI at http://192.168.228.1:7081
     INFO 15:00:05,645 SparkWorker: Connecting to master spark://127.0.0.4:7077...
     INFO 15:00:05,950 SparkMaster: Registering worker 127.0.0.4:54942 with 6 cores, 9.5 GB RAM
     INFO 15:00:05,953 SparkMaster: Adding worker 127.0.0.4
     INFO 15:00:06,046 SparkMaster: New Cassandra host /127.0.0.2:9042 added
     INFO 15:00:06,047 SparkMaster: New Cassandra host /127.0.0.1:9042 added
     INFO 15:00:06,047 SparkMaster: Connected to Cassandra cluster: Test Cluster
     INFO 15:00:06,047 SparkMaster: New Cassandra host /127.0.0.4:9042 added
     INFO 15:00:06,105 SparkWorker: Successfully registered with master spark://127.0.0.4:7077


    OpsCenter with 1 Analytics node: BOOM!






No comments:

Post a Comment