spark怎么读取hadoop数据

2024-12-25 16:55:16
推荐回答(1个)
回答1:

如何使用Spark/Scala读取Hbase的数据 必须使用高亮参数启动Spark-shell,否则当你遍历RDD时会出现如下的Exception java.io.NotSerializableException: org.apache.hadoop.hbase.io.ImmutableBytesWritable spark-shell--conf spark.serializer=org.apache.spark.serializer.KryoSerializer 以下代码,经过MaprDB实测通过 import org.apache.spark._ import org.apache.spark.rdd.NewHadoopRDD import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} import org.apache.hadoop.hbase.client.HBaseAdmin import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HColumnDescriptor import org.apache.hadoop.hbase.util.Bytes import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.HTable; val tableName = "/app/SubscriptionBillingPlatform/TRANSAC_ID" val conf = HBaseConfiguration.create() conf.set(TableInputFormat.INPUT_TABLE, tableName) //create rdd val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputForma...