cassandra - Zeppelin spark RDD commands fail yet work in spark-shell -
i have setup standalone single node "cluster" running following:
- cassandra 2.2.2
- spark 1.5.1
- list item
- compiled fat jar spark-cassandra-connector 1.5.0-m2
- compiled zeppelin 0.6 snapshot compiled with: mvn -pspark-1.5 -dspark.version=1.5.1 -dhadoop.version=2.6.0 -phadoop-2.4 -dskiptests clean package
i can work fine spark shell retrieving data cassandra
i have altered zeppelin-env.sh follow:
export master=spark://localhost:7077 export spark_home=/root/spark-1.5.1-bin-hadoop2.6/ export zeppelin_port=8880 export zeppelin_java_opts="-dspark.jars=/opt/sparkconnector/spark-cassandra-connector-assembly-1.5.0-m2-snapshot.jar -dspark.cassandra.connection.host=localhost" export zeppelin_notebook_dir="/root/gowalla-spark-demo/notebooks/zeppelin" export spark_submit_options="--jars /opt/sparkconnector/spark-cassandra-connector-assembly-1.5.0-m2-snapshot.jar --deploy-mode cluster" export zeppelin_intp_java_opts=$zeppelin_java_opts
i start adding paragraphs notebook , import following first:
import com.datastax.spark.connector._ import com.datastax.spark.connector.cql._ import com.datastax.spark.connector.rdd.cassandrardd import org.apache.spark.rdd.rdd import org.apache.spark.sparkcontext import org.apache.spark.sparkconf
not sure if of these necessary. paragraph runs fine.
then following:
val checkins = sc.cassandratable("lbsn", "checkins")
this runs fine , returns:
checkins: com.datastax.spark.connector.rdd.cassandratablescanrdd[com.datastax.spark.connector.cassandrarow] = cassandratablescanrdd[0] @ rdd @ cassandrardd.scala:15
then next paragraph - follow 2 statements run -the first succeeds , second fails:
checkins.count checkins.first
result:
res13: long = 138449 com.fasterxml.jackson.databind.jsonmappingexception: not find creator property name 'id' (in class org.apache.spark.rdd.rddoperationscope) @ [source: {"id":"4","name":"first"}; line: 1, column: 1] @ com.fasterxml.jackson.databind.jsonmappingexception.from(jsonmappingexception.java:148) @ com.fasterxml.jackson.databind.deserializationcontext.mappingexception(deserializationcontext.java:843) @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.addbeanprops(beandeserializerfactory.java:533) @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.buildbeandeserializer(beandeserializerfactory.java:220) @ com.fasterxml.jackson.databind.deser.beandeserializerfactory.createbeandeserializer(beandeserializerfactory.java:143) @ com.fasterxml.jackson.databind.deser.deserializercache._createdeserializer2(deserializercache.java:409) @ com.fasterxml.jackson.databind.deser.deserializercache._createdeserializer(deserializercache.java:358) @ com.fasterxml.jackson.databind.deser.deserializercache._createandcache2(deserializercache.java:265) @ com.fasterxml.jackson.databind.deser.deserializercache._createandcachevaluedeserializer(deserializercache.java:245) @ com.fasterxml.jackson.databind.deser.deserializercache.findvaluedeserializer(deserializercache.java:143) @ com.fasterxml.jackson.databind.deserializationcontext.findrootvaluedeserializer(deserializationcontext.java:439) @ com.fasterxml.jackson.databind.objectmapper._findrootdeserializer(objectmapper.java:3666) @ com.fasterxml.jackson.databind.objectmapper._readmapandclose(objectmapper.java:3558) @ com.fasterxml.jackson.databind.objectmapper.readvalue(objectmapper.java:2578) @ org.apache.spark.rdd.rddoperationscope$.fromjson(rddoperationscope.scala:82) @ org.apache.spark.rdd.rdd$$anonfun$34.apply(rdd.scala:1582) @ org.apache.spark.rdd.rdd$$anonfun$34.apply(rdd.scala:1582) @ scala.option.map(option.scala:145) @ org.apache.spark.rdd.rdd.<init>(rdd.scala:1582) @ com.datastax.spark.connector.rdd.cassandrardd.<init>(cassandrardd.scala:15) @ com.datastax.spark.connector.rdd.cassandratablescanrdd.<init>(cassandratablescanrdd.scala:59) @ com.datastax.spark.connector.rdd.cassandratablescanrdd.copy(cassandratablescanrdd.scala:92) @ com.datastax.spark.connector.rdd.cassandratablescanrdd.copy(cassandratablescanrdd.scala:59) @ com.datastax.spark.connector.rdd.cassandrardd.limit(cassandrardd.scala:103) @ com.datastax.spark.connector.rdd.cassandrardd.take(cassandrardd.scala:122) @ org.apache.spark.rdd.rdd$$anonfun$first$1.apply(rdd.scala:1312) @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:147) @ org.apache.spark.rdd.rddoperationscope$.withscope(rddoperationscope.scala:108) @ org.apache.spark.rdd.rdd.withscope(rdd.scala:306) @ org.apache.spark.rdd.rdd.first(rdd.scala:1311) @ $iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:36) @ $iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:41) @ $iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:43) @ $iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:45) @ $iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:47) @ $iwc$$iwc$$iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:49) @ $iwc$$iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:51) @ $iwc$$iwc$$iwc$$iwc$$iwc.<init>(<console>:53) @ $iwc$$iwc$$iwc$$iwc.<init>(<console>:55) @ $iwc$$iwc$$iwc.<init>(<console>:57) @ $iwc$$iwc.<init>(<console>:59) @ $iwc.<init>(<console>:61) @ <init>(<console>:63) @ .<init>(<console>:67) @ .<clinit>(<console>) @ .<init>(<console>:7) @ .<clinit>(<console>) @ $print(<console>) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:62) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:497) @ org.apache.spark.repl.sparkimain$readevalprint.call(sparkimain.scala:1065) @ org.apache.spark.repl.sparkimain$request.loadandrun(sparkimain.scala:1340) @ org.apache.spark.repl.sparkimain.loadandrunreq$1(sparkimain.scala:840) @ org.apache.spark.repl.sparkimain.interpret(sparkimain.scala:871) @ org.apache.spark.repl.sparkimain.interpret(sparkimain.scala:819) @ org.apache.zeppelin.spark.sparkinterpreter.interpretinput(sparkinterpreter.java:655) @ org.apache.zeppelin.spark.sparkinterpreter.interpret(sparkinterpreter.java:620) @ org.apache.zeppelin.spark.sparkinterpreter.interpret(sparkinterpreter.java:613) @ org.apache.zeppelin.interpreter.classloaderinterpreter.interpret(classloaderinterpreter.java:57) @ org.apache.zeppelin.interpreter.lazyopeninterpreter.interpret(lazyopeninterpreter.java:93) @ org.apache.zeppelin.interpreter.remote.remoteinterpreterserver$interpretjob.jobrun(remoteinterpreterserver.java:276) @ org.apache.zeppelin.scheduler.job.run(job.java:170) @ org.apache.zeppelin.scheduler.fifoscheduler$1.run(fifoscheduler.java:118) @ java.util.concurrent.executors$runnableadapter.call(executors.java:511) @ java.util.concurrent.futuretask.run(futuretask.java:266) @ java.util.concurrent.scheduledthreadpoolexecutor$scheduledfuturetask.access$201(scheduledthreadpoolexecutor.java:180) @ java.util.concurrent.scheduledthreadpoolexecutor$scheduledfuturetask.run(scheduledthreadpoolexecutor.java:293) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745)
why call first fail. calls such sc.fromtextfile fail.
the following works:
checkins.where("year = 2010 , month=2 , day>12 , day<15").count()
but not:
checkins.where("year = 2010 , month=2 , day>12 , day<15").first()
please assist driving me insane. since spark shell works not or @ least seems partially broken.
thanks
com.fasterxml.jackson.databind.jsonmappingexception: not find creator property name 'id' (in class org.apache.spark.rdd.rddoperationscope) @ [source: {"id":"4","name":"first"}; line: 1, column: 1]
this exception occurs when there're 2 or more version of jackson libraries in classpath.
make sure spark interpreter process has single version of jackson library in classpath.
Comments
Post a Comment