Google Certified Professional Data Engineer Set 1

Welcome to Google Certified Professional Data Engineer Set 1.Click on Next Button to proceed.

1. What is a pipeline in the context of Cloud Dataflow?
2. Your company stores data in BigQuery for long-term and short-term analytics queries. Most of the jobs only need to study the last 7 days of data. Over time, the cost of queries keeps going up. How can you redesign the database to lower the cost of the most frequent queries?
3. What method could you use to help compute averages when dealing with unbounded/streaming data?
4. If you have 2 replicating clusters in your Bigtable instance, how can you ensure that your application will be guaranteed strong consistency for its transactions?
5. How does Bigtable managed rows, tables and tablets?
6. Bigtable is compatible with which open source project's client library for Java?
7. For what types of data workload would Bigtable *not* be a good fit? (Choose 2)
8. What is the compute model of Cloud Dataflow?
9. You are required to load a large volume of data into BigQuery that does contain some duplication. What should you do to ensure the best query performance once the data is loaded?
10. A customer has 90TB of archive data to move to Google Cloud Storage, but has restrictions on connecting their private network to the public Internet. How could you facilitate this?
11. Which native output connectors are supported by Dataproc? (Choose 3)
12. True of False: Preemptible workers in a Dataproc cluster cannot store HDFS data.
13. A customer wants to run Spark jobs on a low-cost ephemeral Dataproc cluster, utilizing preemptible workers wherever possible, but needs to store the results of Dataproc jobs persistently. What would you recommend?
14. What are some ways to help control costs in BigQuery? (Choose 2)
15. Which GCP product implements the Apache Beam SDK and is sometimes recommended as an alternative to Dataproc particularly for streaming data?
16. When you run a query in BigQuery, what happens to the results?
17. A customer needs a JSON document store NoSQL database to help them develop a new application. Which GCP product should they choose?
18. Your customer would like to use Dataproc, but the standard image does not contain some additional Spark components required to run their jobs. What would you recommend?
19. What is the best way to grant someone temporary access to a file if they do not have a Google account?
20. You are required to share a subset of data from a BigQuery data set with a 3rd party analytics team. The data may change over time, and you should not grant unnecessary projects permissions to this team if you can avoid it. How should you proceed?
21. How does Bigtable managed the storage of tablets?
22. What is a federated data source?
23. What is the name given to a dataset that can be acted upon within a Cloud Dataflow pipeline?
24. Which primary Apache services does Dataproc run? (Choose 2)
25. Using Cloud IAM, what is the most granular level for which you can configure access control for Pub/Sub?


Leave a Reply

Your email address will not be published. Required fields are marked *