foundev.github.io

Connection to Oracle From Spark

For some silly reason there is a has been a fair amount of difficulty in reading and writing to Oracle from Spark when using DataFrames.

SPARK-10648 — Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

This issue manifests itself when you have numbers in your schema with no precision or scale and attempt to read. They added the following dialect here

This fixes the issue for 1.4.2, 1.5.3 and 1.6.0 (and DataStax Enterprise 4.8.3). However recently there is another issue.

SPARK-12941 — Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

And this lead me to this SO issue. Unfortunately, one of the answers has a fix they claim only works for 1.5.x however, I had no issue porting it to 1.4.1. The solution in Java looked something like the following, which is just a Scala port of the SO answer above ( This is not under warranty and it may destroy your server, but this should allow you to write to Oracle.)

In the future keep an eye out for more official support in SPARK-12941 and then you can ever forget the hacky workaround above.