AWS Data Analytics – Specialty Quiz-4

31) In Your Project, EMR cluster requires high I/O performance and at a low cost. In terms of storage, which of the following is your best option?
EBS volumes with PIOPS
Instance store volumes
EMRFS with consistent view

Correct Answer: Instance store volumes

32) You have just joined a company that has a petabyte of data stored in multiple data sources. The queries are processed in-memory, and avoid high I/O and latency. Which open-source tool do you think your new colleagues are using?


Big Data Query Engine
Presto
Hive
Correct Answer: Presto

33) You plan to use EMR to process a large amount of data that will eventually be stored in S3. The data is currently on-premise, and will be migrated to AWS using the Snowball service. Over the next 8 months, your company will migrate over 4 PB of data to S3 and costs are a concern. Which compression algorithm provides you minimize costs?
Snappy
bzip2
LZO
Correct Answer: bzip2

34) When should you not use Spark?
For batch processing
For interactive analytics
For ETL workloads
In multi-user environments with high concurrency
Correct Answer: For batch processing.In multi-user environments with high concurrency.

35) How are EMR tasks nodes different from core nodes?
Task nodes run the NodeManager daemon.
They are used for extra capacity when additional CPU and RAM are needed.
Task nodes are optional.
Task nodes do not include HDFS.
Task nodes run the Resource Manager.
are used for extra capacity when additional CPU and RAM are needed..Task nodes are optional..Task nodes do not include HDFS..

36) What is a fast way to load data into Redshift?
By restoring backup data files into Redshift.
By using the COPY command.
By using multi-line INSERTS.
Correct Answer: By using the COPY command.

37) An Area Under Curve (AUC) is shown to be 0.9. What does this signify?
The model is no more accurate than flipping a coin.
The AUC provides no value.
Lower AUC numbers would increase confidence.
There is little confidence beyond a guess.
Correct Answer: The model is no more accurate than flipping a coin..There is little confidence beyond a guess..

38) You are trying to predict a numeric value from inventory/retail data that your company has. Which machine learning model would you use to do this?
Numeric Prediction Model
Regression Model
Multiclass Classification Model
Correct Answer: Regression Model

39) What is the most effective way to merge data into an existing table?
UNLOAD data from Redshift into S3, use EMR to ‘merge’ new data files with the unloaded data files, and copy the data into Redshift.
Use a staging table to replace existing rows or update specific rows.
Connect the source table and the target Redshift table via a replication tool and run direct INSERTS, UPDATES into the target Redshift table.
Correct Answer: Use a staging table to replace existing rows or update specific rows.

40) True or False: Redshift is recommended for transactional processing.
True
False
Correct Answer: False