AWS Certified Big Data Specialty Set4 Welcome to AWS Certified Big Data Specialty Set4. Please enter your email details to get QUIZ Details on your email id. Click on Next Button to proceed. 1. True or False: Redshift is recommended for transactional processing. True False2. Which of the following are characteristics of Supervised Learning? (Choose 2) Labeled data Known desired output Data lacks categorization Small amount of data is required to process3. How many concurrent queries can you run on a Redshift cluster? 500 150 504. What are some the benefits and use cases of Columnar Databases? (Choose 2) They store binary objects quite well. They’re ideal for 'needle in a haystack' queries. Compression, as it helps with performance and provides a lower total cost of ownership. They are ideal for Online Analytical Processing (OLAP).5. True or False: When you use the UNLOAD command in Redshift to write data to S3, it automatically creates files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE-S3). False True6. True or False: Defining primary keys and foreign keys is an important part of Redshift design because it helps maintain data integrity True False7. Name two types of machine learning that are routinely encountered? (Choose 2) Hypervised Learning Unsupervised Learning Supervised Learning Transcoded Learning8. Your analytics team runs large, long-running queries in an automated fashion throughout the day. The results of these large queries are then used to make business decisions. However, the analytics team also runs small queries manually on ad-hoc basis. How can you ensure that the large queries do not take up all the resources, preventing the smaller ad-hoc queries from running? Do nothing, because Redshift handles this automatically. Create a query user group for small queries based on the analysts’ Redshift user IDs, and create a second query group for the large, long-running queries. Setup node affinity and assign large queries and small queries to run-specific nodes.9. Which of the following AWS services directly integrate with Redshift using the COPY command. (Choose 3) DynamoDB Machine Learning EMR/EC2 instances S3 Kinesis Streams10. In your current data warehouse, BI analysts consistently join two tables: the customer table and the orders table. The column they JOIN on (and common to both tables) is called customer_id. Both tables are very large, over 1 billion rows. Besides being in charge of migrating the data, you are also responsible for designing the tables in Redshift. Which distribution style would you choose to achieve the best performance when the BI analysts run queries that JOIN the customer table and orders table using customer_id? EVEN DEFAULT KEY11. What is the most effective way to merge data into an existing table? UNLOAD data from Redshift into S3, use EMR to 'merge' new data files with the unloaded data files, and copy the data into Redshift. Use a staging table to replace existing rows or update specific rows. Connect the source table and the target Redshift table via a replication tool and run direct INSERTS, UPDATES into the target Redshift table.12. You are trying to predict a numeric value from inventory/retail data that your company has. Which machine learning model would you use to do this? Numeric Prediction Model Regression Model Multiclass Classification Model13. What does the F1 score represent? The quality of the model S3 record import success The accuracy of the input data14. What is a fast way to load data into Redshift? By restoring backup data files into Redshift. By using the COPY command. By using multi-line INSERTS.15. You have a table in your Redshift cluster, and the data in this table changes infrequently. The table has fewer than 15 million rows and does not JOIN any other tables. Which distribution style would you select for this table? DEFAULT ALL KEY16. You are trying to predict whether a customer will buy your product. Which machine learning model would help you make this prediction? Numeric Prediction Model Multiclass Classification Model Binary Classification Model17. Which of the following is not a function of the Redshift manifest? To load required files only To load files that have a different prefix. To automatically check files in S3 for data issues.18. An Area Under Curve (AUC) is shown to be 0.5. What does this signify? (Choose 2) The model is no more accurate than flipping a coin. The AUC provides no value. Lower AUC numbers would increase confidence. There is little confidence beyond a guess.19 out of Please fill in the comment box below.