AWS Certified Big Data Specialty Set4 Author: CloudVikas Published Date: 17 March 2020 Leave a Comment on AWS Certified Big Data Specialty Set4 Welcome to AWS Certified Big Data Specialty Set4. Please enter your email details to get QUIZ Details on your email id. Click on Next Button to proceed. 1. Which of the following is not a function of the Redshift manifest?To load required files onlyTo load files that have a different prefix.To automatically check files in S3 for data issues.2. What is a fast way to load data into Redshift?By restoring backup data files into Redshift.By using the COPY command.By using multi-line INSERTS.3. What are some the benefits and use cases of Columnar Databases? (Choose 2) They store binary objects quite well. They’re ideal for 'needle in a haystack' queries. Compression, as it helps with performance and provides a lower total cost of ownership. They are ideal for Online Analytical Processing (OLAP).4. You are trying to predict a numeric value from inventory/retail data that your company has. Which machine learning model would you use to do this?Numeric Prediction ModelRegression ModelMulticlass Classification Model5. Which of the following AWS services directly integrate with Redshift using the COPY command. (Choose 3) DynamoDB Machine Learning EMR/EC2 instances S3 Kinesis Streams6. Name two types of machine learning that are routinely encountered? (Choose 2) Hypervised Learning Unsupervised Learning Supervised Learning Transcoded Learning7. How many concurrent queries can you run on a Redshift cluster?500150508. Your analytics team runs large, long-running queries in an automated fashion throughout the day. The results of these large queries are then used to make business decisions. However, the analytics team also runs small queries manually on ad-hoc basis. How can you ensure that the large queries do not take up all the resources, preventing the smaller ad-hoc queries from running?Do nothing, because Redshift handles this automatically.Create a query user group for small queries based on the analysts’ Redshift user IDs, and create a second query group for the large, long-running queries.Setup node affinity and assign large queries and small queries to run-specific nodes.9. You have a table in your Redshift cluster, and the data in this table changes infrequently. The table has fewer than 15 million rows and does not JOIN any other tables. Which distribution style would you select for this table?DEFAULTALLKEY10. In your current data warehouse, BI analysts consistently join two tables: the customer table and the orders table. The column they JOIN on (and common to both tables) is called customer_id. Both tables are very large, over 1 billion rows. Besides being in charge of migrating the data, you are also responsible for designing the tables in Redshift. Which distribution style would you choose to achieve the best performance when the BI analysts run queries that JOIN the customer table and orders table using customer_id?EVENDEFAULTKEY11. What does the F1 score represent?The quality of the modelS3 record import successThe accuracy of the input data12. What is the most effective way to merge data into an existing table? UNLOAD data from Redshift into S3, use EMR to 'merge' new data files with the unloaded data files, and copy the data into Redshift.Use a staging table to replace existing rows or update specific rows.Connect the source table and the target Redshift table via a replication tool and run direct INSERTS, UPDATES into the target Redshift table.13. You are trying to predict whether a customer will buy your product. Which machine learning model would help you make this prediction?Numeric Prediction ModelMulticlass Classification ModelBinary Classification Model14. An Area Under Curve (AUC) is shown to be 0.5. What does this signify? (Choose 2) The model is no more accurate than flipping a coin. The AUC provides no value. Lower AUC numbers would increase confidence. There is little confidence beyond a guess.15. Which of the following are characteristics of Supervised Learning? (Choose 2)Labeled dataKnown desired outputData lacks categorizationSmall amount of data is required to process16. True or False: When you use the UNLOAD command in Redshift to write data to S3, it automatically creates files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE-S3).FalseTrue17. True or False: Redshift is recommended for transactional processing.TrueFalse18. True or False: Defining primary keys and foreign keys is an important part of Redshift design because it helps maintain data integrityTrueFalse1 out of Please fill in the comment box below. Author: CloudVikas