AWS Certified Big Data Specialty Set3

Welcome to AWS Certified Big Data Specialty Set3.

Please enter your email details to get QUIZ Details on your email id.

Click on Next Button to proceed.

1. You have just joined a new company as an AWS Big Data Architect, replacing an architect who left to join a different company. As a data driven company, your company has started using several of AWS' Big Data services in the last 6 months. Your new manager is concerned that the AWS charges are too high, and she has asked you to review the monthly bills. After review, you determine that the EMR costs are unnecessarily high considering the company uses EMR to process new data within a 6 hour period that starts at midnight and ends between 5 AM and 7 AM, depending on the amount of data that needs to be processed. The data that needs to be processed is already in S3. However, it appears that the EMR cluster that processes the data is running 24 hours a day, 7 days a week. What type of cluster should your predecessor have configured in order to keep costs low and not unnecessarily waste resources?
2. True or False: EBS volumes used with EMR persist after the cluster is terminated.
3. Which of the following does Spark Streaming use to consume data from a Kinesis Stream?
4. True or False: Presto is a database.
5. Which of the following are the 4 modules (libraries) of Spark? (Choose 4)
6. Which open-source Web interface provides you with a easy way to run scripts, manage the Hive metastore, and view HDFS?
7. Your EMR cluster requires high I/O performance and at a low cost. In terms of storage, which of the following is your best option?
8. You have just joined a company that has a petabyte of data stored in multiple data sources. The data sources include Hive, Cassandra, Redis, and MongoDB. The company has hundreds of employees all querying the data at a high concurrency rate. These queries take between a sub-second and several minutes to run. The queries are processed in-memory, and avoid high I/O and latency. A lot of your new colleagues are also happy they did not have to learn a new language when querying the multiple data sources. Which open-source tool do you think your new colleagues are using?
9. You plan to use EMR to process a large amount of data that will eventually be stored in S3. The data is currently on-premise, and will be migrated to AWS using the Snowball service. The file sizes range from 300 MB to 500 MB. Over the next 6 months, your company will migrate over 2 PB of data to S3 and costs are a concern. Which compression algorithm provides you with the highest compression ratio, allowing you to both maximize performance minimize costs?
10. When should you not use Spark? (Choose 2)
11. How are EMR tasks nodes different from core nodes? (Choose 3)

Leave a Reply

Your email address will not be published. Required fields are marked *