AWS Certified Machine Learning – Specialty Set 4 Author: CloudVikas Published Date: 19 March 2020 Welcome to AWS Certified Machine Learning - Specialty Set 5. Please enter your email details to get QUIZ Details on your email id. Click on Next Button to proceed. 1. Your company currently has a large on-prem Hadoop cluster that contains data you would like to use for a training job. Your cluster is equipped with Mahout, Flume, Hive, Spark, and Ganglia. How might you most efficiently use this data?Using EMR, create a Scala script to export the data to an HDFS volume. Copy that data over to an EBS volume where it can be read by the SageMaker training containers.Use Mahout on the Hadoop Cluster to preprocess the data into a format that is compatible with SageMaker. Export the data with Flume to the local storage of the training container and launch the training job.Ensure that Spark is supported on your Hadoop cluster and leverage the SageMaker Spark library.2. When you issue a CreateModel API call using a built-in algorithm, which of the following actions would be next?SageMaker launches an appropriate training container from the algorithm selected from the regional container repository.Sagemaker provisions an EC2 instances using the appropriate AMI for the algorithm selected from the global container registry.SageMaker launches an appropriate inference container for the algorithm selected from the regional container repository.3. You have launched a training job but it fails after a few minutes. What is the first thing you should do for troubleshooting?Go to CloudWatch logs and try to identify the error in the logs for your job.Submit the job with AWS X-Ray enabled for additional debug information.Go to CloudTrail logs and try to identify the error in the logs for your job.4. You are consulting for a mountain climbing gear manufacturer and have been asked to design a machine learning approach for predicting the strength of a new line of climbing ropes. Which approach might you choose?You would choose a simulation-based reinforcement learning approach.You would recommend they do not use a machine learning model.You would approach the problem as a linear regression problem to predict the tensile strength of the rope based on other ropes.5. Which of the following mean that our algorithm predicted false but the real outcome was true?True NegativeFalse PositiveFalse Negative6. You are working on a model that tries to predict the future revenue of select companies based on 50 years of historic data from public financial filings. What might be a strategy to determine if the model is reasonably accurate?Use Random Cut Forest to remove any outliers and rerun the algorithm on the last 20% of the data.Use a set of the historic data as testing data to back-test the model and compare results to actual historical results.Use a softmax function to invert the historical data then run the validation job from most recent to earliest history.7. You have been provided with a cleansed CSV dataset you will be using for a linear regression model. Of these tasks, which might you do next? (Choose 2) Split the data into testing and training datasets. Run a randomization process on the data. Perform one-hot encoding on the softmax results. Run a Peterman distribution on the data to sort it properly for linear regression.8. We are using a k-fold method of cross-validation for our linear regression model. What outcome will indicate that our training data is not biased?K-fold is not appropriate for us with linear regression problems.Bias is not a concern with linear regression problems as the error function resolves this.All k-fold validation rounds have roughly the same error rate.9. We are running a training job over and over again using slightly different, very large datasets as an experiment. Training is taking a very long time with your I/O-bound training algorithm and you want to improve training performance. What might you consider? (Choose 2) Convert the data format to an Integer32 tensor. Make use of file mode to stream data directly from S3. Make use of pipe mode to stream data directly from S3. Convert the data format to protobuf recordIO format.10. You want to be sure to use the most stable version of a training container. How do you ensure this?Use the :1 tag when specifying the ECR container path.Use the ECR repository located in US-EAST-2.Use the path to the global container repository.11. We are designing a binary classification model that tries to predict whether a customer is likely to respond to a direct mailing of our catalog. Because it is expensive to print and mail our catalog, we want to only send to customers where we have a high degree of certainty they will buy something. When considering if the customer will buy something, what outcome would we want to minimize in a confusion matrix?True NegativeFalse PositiveFalse Affirmative12. We are using a CSV dataset for unsupervised learning that does not include a target value. How should we indicate this for training data as it sits on S3?Enable pipe mode when we initiate the training run.Include label_size=0 appended to the Content-Type key.CSV data format should not be used for unsupervised learning algorithms.13 out of 12Please fill in the comment box below. Author: CloudVikas