Menu
How do one call packages from spark to be utilized for data operations with R?
Parallelizing Existing R Packages with SparkR Hossein Falaki @mhfalaki 2. About me. Former Data Scientist at Apple Siri. Software Engineer at Databricks. Started using Apache Spark since version 0.6. Developed first version of Apache Spark CSV data source. Worked on SparkR &Databricks R Notebook feature 2 3. What is SparkR? SparkR.init (Deprecated) Initialize a new Spark Context: sparkR.newJObject: Create Java Objects: sparkR.uiWebUrl: Get the URL of the SparkUI instance for the current active SparkSession: sparkR.version: Get version of Spark on which this application is running: storageLevel: StorageLevel: str: Compactly display the structure of a dataset.
example i am trying to access my test.csv in hdfs as below
but getting error as below:
i tried loading the csv package by below option
but getting the below error during loading sqlContext
Any help will be highly appreciated.
san71san71
1 Answer
So it looks like by setting
SPARKR_SUBMIT_ARGS
you are overriding the default value, which is sparkr-shell
. You could probably do the same thing and just append sparkr-shell to the end of your SPARKR_SUBMIT_ARGS. This is seems unnecessarily complex compared to depending on jars so I've created a JIRA to track this issue (and I'll try and a fix if the SparkR people agree with me) https://issues.apache.org/jira/browse/SPARK-8506 . Note: another option would be using the sparkr command +
--packages com.databricks:spark-csv_2.10:1.0.3
since that should work.HoldenHolden
Not the answer you're looking for? Browse other questions tagged rapache-sparksparkr or ask your own question.
How do one call packages from spark to be utilized for data operations with R?
example i am trying to access my test.csv in hdfs as below
but getting error as below:
i tried loading the csv package by below option
but getting the below error during loading sqlContext
Any help will be highly appreciated.
san71san71
1 Answer
So it looks like by setting
SPARKR_SUBMIT_ARGS
you are overriding the default value, which is sparkr-shell
. You could probably do the same thing and just append sparkr-shell to the end of your SPARKR_SUBMIT_ARGS. This is seems unnecessarily complex compared to depending on jars so I've created a JIRA to track this issue (and I'll try and a fix if the SparkR people agree with me) https://issues.apache.org/jira/browse/SPARK-8506 . Note: another option would be using the sparkr command +
--packages com.databricks:spark-csv_2.10:1.0.3
since that should work.HoldenHolden