AWS Certified Machine Learning – Specialty Set 4 Welcome to AWS Certified Machine Learning - Specialty Set 5. Please enter your email details to get QUIZ Details on your email id. Click on Next Button to proceed. 1. You are consulting for a mountain climbing gear manufacturer and have been asked to design a machine learning approach for predicting the strength of a new line of climbing ropes. Which approach might you choose? You would recommend they do not use a machine learning model. You would approach the problem as a linear regression problem to predict the tensile strength of the rope based on other ropes. You would choose a simulation-based reinforcement learning approach.2. Your company currently has a large on-prem Hadoop cluster that contains data you would like to use for a training job. Your cluster is equipped with Mahout, Flume, Hive, Spark, and Ganglia. How might you most efficiently use this data? Ensure that Spark is supported on your Hadoop cluster and leverage the SageMaker Spark library. Using EMR, create a Scala script to export the data to an HDFS volume. Copy that data over to an EBS volume where it can be read by the SageMaker training containers. Use Mahout on the Hadoop Cluster to preprocess the data into a format that is compatible with SageMaker. Export the data with Flume to the local storage of the training container and launch the training job.3. You have launched a training job but it fails after a few minutes. What is the first thing you should do for troubleshooting? Go to CloudWatch logs and try to identify the error in the logs for your job. Submit the job with AWS X-Ray enabled for additional debug information. Go to CloudTrail logs and try to identify the error in the logs for your job.4. You are working on a model that tries to predict the future revenue of select companies based on 50 years of historic data from public financial filings. What might be a strategy to determine if the model is reasonably accurate? Use Random Cut Forest to remove any outliers and rerun the algorithm on the last 20% of the data. Use a set of the historic data as testing data to back-test the model and compare results to actual historical results. Use a softmax function to invert the historical data then run the validation job from most recent to earliest history.5. We are running a training job over and over again using slightly different, very large datasets as an experiment. Training is taking a very long time with your I/O-bound training algorithm and you want to improve training performance. What might you consider? (Choose 2) Make use of file mode to stream data directly from S3. Convert the data format to protobuf recordIO format. Convert the data format to an Integer32 tensor. Make use of pipe mode to stream data directly from S3.6. You have been provided with a cleansed CSV dataset you will be using for a linear regression model. Of these tasks, which might you do next? (Choose 2) Split the data into testing and training datasets. Run a Peterman distribution on the data to sort it properly for linear regression. Run a randomization process on the data. Perform one-hot encoding on the softmax results.7. We are designing a binary classification model that tries to predict whether a customer is likely to respond to a direct mailing of our catalog. Because it is expensive to print and mail our catalog, we want to only send to customers where we have a high degree of certainty they will buy something. When considering if the customer will buy something, what outcome would we want to minimize in a confusion matrix? True Negative False Affirmative False Positive8. Which of the following mean that our algorithm predicted false but the real outcome was true? False Negative True Negative False Positive9. We are using a CSV dataset for unsupervised learning that does not include a target value. How should we indicate this for training data as it sits on S3? Include label_size=0 appended to the Content-Type key. Enable pipe mode when we initiate the training run. CSV data format should not be used for unsupervised learning algorithms.10. You want to be sure to use the most stable version of a training container. How do you ensure this? Use the :1 tag when specifying the ECR container path. Use the ECR repository located in US-EAST-2. Use the path to the global container repository.11. When you issue a CreateModel API call using a built-in algorithm, which of the following actions would be next? SageMaker launches an appropriate training container from the algorithm selected from the regional container repository. SageMaker launches an appropriate inference container for the algorithm selected from the regional container repository. Sagemaker provisions an EC2 instances using the appropriate AMI for the algorithm selected from the global container registry.12. We are using a k-fold method of cross-validation for our linear regression model. What outcome will indicate that our training data is not biased? Bias is not a concern with linear regression problems as the error function resolves this. K-fold is not appropriate for us with linear regression problems. All k-fold validation rounds have roughly the same error rate.13 out of Please fill in the comment box below.