AWS Certified Machine Learning – Specialty Set 7 Author: CloudVikas Published Date: 19 March 2020 Welcome to AWS Certified Machine Learning - Specialty Set 7. Please enter your email details to get QUIZ Details on your email id. Click on Next Button to proceed. 1. ######QUESTION#########Database selection is very important for each cloud based project.Alex has joined one team where his team manager has requested some assistance with selecting a new database. The database must provide high performance and scalability. The data will be structured and persistent and the DB must support complex queries using SQL and BI tools.Which AWS service will you recommend for Alex to select?RedShiftDynamoDBRDS2. You are preparing for a first training run using a custom algorithm that you have prepared in a docker container. What should you do to ensure that the training metrics are visible to CloudWatch?When defining the training job, ensure that the metric_definitions section is populated with relevant metrics from the stdout and stderr streams in the container.Enable Kinesis Streams to capture the log stream emitting from the custom algorithm containers.Enable CloudTrail for the respective container to capture the relevant training metrics from the custom algorithm.3. ######QUESTION#########John is working on an application for document sharing, it needs a storage layer. As a security measure, the storage should provide automatic support for versioning so that users can easily roll back to a previous version or recover a deleted document. Which AWS service will meet the above requirements for John?Amazon S3Amazon EFS4. ######QUESTION#########A cloud based company cloudvikas.com is generating large datasets with millions of rows that must be summarized by column. Each company is using some storage service.Here,Which storage service meets the requirements?Amazon RedShiftAmazon ElastiCacheAmazon DynamoDB5. We want to perform automatic model tuning on our linear learner model. We have chosen the tunable hyperparameter we want to use. What is our next step?Decide what hyperparameter we want SageMaker to tune in the tuning process.Choose a target objective metric we want SageMaker to use in the tuning process.Choose a range of values which SageMaker will sweep through during the tuning process.6. ######QUESTION#########Consider the following situation:Your manager has assigned you a new task for architecting a workload that requires a highly available and scalable shared block file storage system that must be consumed by multiple Linux applications. Which service meets this requirement?Assign an IAM role to the Lambda function with permissions to list all Amazon RDS instances.Create an IAM access and secret key, and store it in the Lambda function7. A colleague is preparing for their very first training job using the XGBoost algorithm. They ask you how they can ensure that training metrics are captured during the training job. How do you direct them?Do nothing. Sagemaker's built-in algorithms are already configured to send training metrics to CloudTrail.Do nothing. Sagemaker's built-in algorithms are already configured to send training metrics to CloudWatch.Do nothing. Use SageMaker's built-in logging feature and view the logs using Quicksight.8. ######QUESTION#########In Cloud project, John is working on EC2 instance creation and he used to provide EC2 instance details to his team members for project purpose.Consider he is deploying an application on Amazon EC2 that must call AWS APIs. Which method of securely passing credentials to the application should he use?Assign IAM roles to the EC2 instancesStore API credentials as an object in Amazon S3Store the API credentials on the instance using instance metadata9. You are designing a testing plan for an update release of your company's mission critical loan approval model. Due to regulatory compliance, it is critical that the updates are not used in production until regression testing has shown that the updates perform as good as the existing model. Which validation strategy would you choose? (Choose 2)Make use of backtesting with historic data.Use an A/B test to expose the updates to real-world traffic.Use a rolling upgrade to determine if the model is ready for production.Use a K-Fold validation method.10. We have just completed a validation job for a multi-class classification model that attempts to classify books into one of five genres. In reviewing the validation metrics, we observe a Macro Average F1 score of 0.28 with one genre, historic fiction, having an F1 score of 0.9. What can we conclude from this?Our training data might be biased toward historic fiction and lacking in examples of other genres.Our model is very poor at predicting historic fiction but quite good at the other genres given the Macro F1 Score.We might try a linear regression model instead of a multi-class classification.11. After training and validation sessions, we notice that the error rate is higher than we want for both sessions. Visualization of the data indicates that we don't seem to have any outliers. What else might we do? (Choose 3)Add more variables to the dataset.Reduce the dimensions of the data.Run training for a longer period of time.Gather more data for our training process.Run a random cut forest algorithm on the data.12. In your first training job of a binary classification problem, you observe an F1 score of 0.996. You make some adjustments and rerun the training job again, which results in an F1 score of 0.034. What can you conclude from this? (Choose 2) The adjustments drastically improved our model. Nothing can be concluded from an F1 score by itself. The adjustments drastically worsened our model. Our accuracy has decreased.13. After training and validation sessions, we notice that the accuracy rate for training is acceptable but the accuracy rate for validation is very poor. What might we do? (Choose 3) Increase the learning rate. Reduce dimensionality. Add an early stop. Encode the data using Laminar Flow Step-up. Gather more data for our training process.14. ######QUESTION#########Charles is working as a Solution Architect working on an application, he is using Amazon DynamoDB to stage its product catalog, which is 1 GB. Since a product entry, on average, consists of 100 KB of data, and the average traffic is about 250 requests per second, the database administrator has provisioned 3,000 RCUs of read capacity throughput. However, some products are very popular and users are experiencing delays or timeouts due to throttling. What improvement offers a long-term solution to this problem?Change the partition key to consist of a hash of product key and product type, instead of just the product key.Augment Amazon DynamoDB by storing only the key product attributes, with the details stored on Amazon S3.15. ######QUESTION#########Tom has joined in mnc company where he has to work on designing web application.You are putting together a design for a three-tier web application. The application tier requires a minimum of 6 EC2 instances to be running at all times. You need to provide fault tolerance to ensure that the failure of a single Availability Zone (AZ) will not affect application performance.Which of the options below is the optimum solution to fulfil these requirements?Create an ASG with 18 instances spread across 3 AZs behind an ELBCreate an ASG with 6 instances spread across 3 AZs behind an ELBCreate an ASG with 9 instances spread across 3 AZs behind an ELB16. In your first training job of a regression problem, you observe an RMSE of 3.4. You make some adjustments and run the training job again, which results in an RMSE of 2.2. What can you conclude from this?The adjustments improved your model accuracy.The adjustments made your model recall worse.The adjustments had no effect on your model accuracy.17. In a binary classification problem, you observe that precision is poor. Which of the following most contribute to poor precision?Type I ErrorType III ErrorType IV Error18. After multiple training runs, you notice that the the loss function settles on different but similar values. You believe that there is potential to improve the model through adjusting hyperparameters. What might you try next?Change to another algorithm.Decrease the learning rate.Change from a CPU instances to a GPU instance.19. In a regression problem, if we plot the residuals in a histogram and observe a distribution heavily skewed to the right of zero indicating mostly positive residuals, what does this mean?Our model is consistently overestimating.Our model is consistent underestimating.Our model is sufficient with regard to aggregate residual.20. Which of the following metrics are recommended for tuning a Linear Learner model so that we can help avoid overfitting? (Choose 3) test:recall validation:objective_loss validation:recall validation:precision test:precision21 out of 20Please fill in the comment box below. Author: CloudVikas