  1. java - cannot be cast to org.apache.spark.serializer.Serializer

    up vote 1 down vote favorite I am trying to solve a Spark serialization issue with HashMaps using Java. I am referring to the link Save Spark Dataframe into Ela
  2. Does spark dataframe.filter(...).select(...) use sequential search or hash algorithms?

    up vote 1 down vote favorite Scenario: I have a lookup table created (input is JSON file of around 50 Mb) and cached in memory so that it can be looked up while
  3. Where can I find spark.driver.maxresultsize property in Cloudera manager?

    up vote 0 down vote favorite Can anyone provide the location where I can change the value of the environment variable spark.driver.maxresultsize in Cloudera Man
  4. apache spark - Could not Find Implicit parameter in scala typeclass

    up vote 0 down vote favorite I am trying to create a type class based on the type data to load -- Here are the types: trait DataSource case class HDFSSource(pat
  5. scala - How to apply reduceByKey in Spark on some of fields in a Class?

    up vote 0 down vote favorite Here's the scenario, I have a JavaBean Class as below: class JB implements Serializable { private String field_a; private Strin
  6. scala-spark Array mapping

    up vote 1 down vote favorite I have a question about mapping Array in scala. I have the following Array: Array[(scala.collection.immutable.Set[String], com.tren
  7. Spark Streaming and Kafka: one cluster or several standalone boxes?

    up vote 1 down vote favorite 3 I am about taking a decision about using Spark-Streaming Kafka integration. I have a Kafka topic (I can break it into several top
  8. scala - Proper way to make a Spark Fat Jar using SBT

    up vote 3 down vote favorite I need a Fat Jar with Spark because I'm creating a custom node for Knime. Basically it's a self-contained jar executed inside Knime
  9. Spark LDA consumes too much memory

    up vote 9 down vote favorite 5 I'm trying to use spark mllib lda to summarize my document corpus. My problem setting is as bellow. about 100,000 documents about
  10. cassandra - Zeppelin spark RDD commands fail yet work in spark-shell

    up vote 4 down vote favorite I have setup a standalone single node "cluster" running the following: Cassandra 2.2.2 Spark 1.5.1 List item Compiled fat jar for S