AWS Certified Machine Learning – Specialty Set 1

Welcome to AWS Certified Machine Learning - Specialty Set 1.

Please enter your email details to get QUIZ Details on your email id.

Click on Next Button to proceed.

1. You have been tasked with converting multiple JSON files within a S3 bucket to Apache Parquet format. Which AWS service can you use to achieve this with the LEAST amount of effort?
2. You are a ML specialist within a large organization who helps job seekers find both technical and non-technical jobs. You've collected data from a data warehouse from an engineering company to determine which skills qualify job seekers for different positions. After reviewing the data you realise the data is biased. Why?
3. You are trying to set up a crawler within AWS Glue that crawls your input data in S3. For some reason after the crawler finishes executing, it cannot determine the schema from your data and no tables are created within your AWS Glue Data Catalog. What is the reason for these results?
4. When you train your model in SageMaker, where does your training dataset come from?
5. You are a ML specialist working with data that is stored in a distributed EMR cluster on AWS. Currently, your machine learning applications are compatible with the Apache Hive Metastore tables on EMR. You have been tasked with configuring Hive to use the AWS Glue Data Catalog as its metastore. Before you can do this you need to transfer the Apache Hive metastore tables into an AWS Glue Data Catalog. What are the steps you'll need to take to achieve this with the LEAST amount of effort?
6. You have been tasked with setting up crawlers in AWS Glue to crawler different data stores to populate your organization's AWS Glue Data Catalogs. Which of the following input data store is NOT an option when creating a crawler?
7. You are a ML specialist within a large organization who needs to run SQL queries and analytics on thousands of Apache logs files stored in S3. Which set of tools can help you achieve this with the LEAST amount of effort?
8. You have been tasked with collecting thousands of PDFs for building a large corpus dataset. The data within this dataset would be considered what type of data?
9. Your organization has given you several different sets of key-value pair JSON files that need to be used for a machine learning project within AWS. What type of data is this classified as and where is the best place to load this data into?
10. You are a ML specialist within a large organization who needs to run SQL queries and analytics on thousands of Apache logs files stored in S3. Your organization already uses Redshift as their data warehousing solution. Which tool can help you achieve this with the LEAST amount of effort?
11. In general within your dataset, what is the minimum number of observations you should have compared to the number of features?
12. You are a ML specialist who is setting up a ML pipeline. The amount of data you have is massive and needs to be set up and managed on a distributed system to efficiently run processing and analytics on. You also plan to use tools like Apache Spark to process your data to get it ready for your ML pipeline. Which setup and services can most easily help you achieve this?
13. Which Amazon service allows you to build a high-quality training labeled dataset for your machine learning models? This includes human workers, vendor companies that you choose, or an internal, private workforce.
14. An organization needs to store a mass amount of data in AWS. The data has a key-value access pattern, developers need to run complex SQL queries and transactions, and the data has a fixed schema. Which type of data store meets all of their needs?


Leave a Reply

Your email address will not be published. Required fields are marked *