- Your main application file is present in blob storage
- You have your training data in blob storage
- You want to write your output to blob storage
- Add the packages required to interact with blob storage. For eg., with AWS S3 you could set the spark config property
spark.jars.packagesasorg.apache.hadoop:hadoop-aws:3.3.4,com.amazonaws:aws-java-sdk-bundle:1.12.262and Spark will download the packages on its own. Please chose the versions as per your requirement. - Either use a kubernetes service account that has access to the bucket/container you want to read to or write from OR add your credentials as environment variables and use them in your application. Its recommended to use secrets to add the credentials as environment variables.
- Use corresponding file uri for eg., for AWS S3 you would use something like
s3a://my-bucket-name/path/to/file. The prefix varies with the blob store.