Google Certified Professional Data Engineer Set 1 Author: CloudVikas Published Date: 18 June 2021 1 Comment on Google Certified Professional Data Engineer Set 1 Welcome to Google Certified Professional Data Engineer Set 1.Click on Next Button to proceed. 1. What is a federated data source?A BigQuery data set that belongs to a different GCP project, used in an SQL statement.An external data source that can be queried directly even though the data is not stored in BigQuery.A BigQuery table that belongs to a different data set, used in an SQL statement.2. What storage class would be the most cost effective for data that is only accessed once per month?NearlineLow AvailabilityRegional3. High Availability for Cloud SQL PostgreSQL instances works because:Instances share a regional replicated persistent diskInstances replicate data directly between themselves for automatic failoverGoogle SREs are on call and can quickly bring up a new instance in the event of a failure4. What is the best way to grant someone temporary access to a file if they do not have a Google account?Use a signed URL with a specific expiryMake the file publicly accessible but ask the user not to share the linkUse a long and complex name for the bucket and object5. Which native output connectors are supported by Dataproc?Which native output connectors are supported by Dataproc? (Choose 3) BigQuery Cloud Bigtable Cloud Storage Cloud Firestore Cloud SQL6. How can you ensure that table modifications in BigQuery are ACID compliant?Add a nominal wait time to any application queries following an update to allow for eventual consistency.Maintain a separate table that records transactions themselves so changes can be re-applied if any are lost.No special accommodations are required, BigQuery table modifications are ACID compliant by design.7. You are required to share a subset of data from a BigQuery data set with a 3rd party analytics team. The data may change over time, and you should not grant unnecessary projects permissions to this team if you can avoid it. How should you proceed?Create an authorized view based on a specific query for the subset of data, and provide access for the team only to that view.Create an export of the data subset to a Cloud Storage bucket. Provide a signed URL for the team to download the data from the bucket.Add the team to your GCP project and assign them the BigQuery Data Viewer IAM role for the data set.8. What kind of database is Bigtable?Managed SQL-like relational databaseUnmanaged SQL-like relational databaseNoSQL wide-column tabular database9. What is the purpose of a trigger in Cloud Dataflow?Triggers determine when to initiate the processing of a pipeline.Triggers determine when to emit output data, and behave differently for bounded and unbounded data.Triggers determine when to emit output data, but only apply to unbounded/streaming input data.10. What action should you take if you observe that Bigtable performance is poor on a subset of queries?Use the Key Visualizer tool to identify hot spots and consider changing how the row key distributes rowsAdd debugging steps to your application to identify the problematic queriesAdd an additional cluster to the instance to increase the read and write throughput11. What is the compute model of Cloud Dataflow?Dataflow fully manages Compute Engine services to run jobs.Dataflow manages Cloud Storage but requires a pre-built Kubernetes Engine cluster to run jobs.Dataflow requires a pre-built Cloud Dataproc cluster to run jobs.12. What is a pipeline in the context of Cloud Dataflow?A pipeline represents the entire series of steps involved in ingesting data, transforming that data and writing output.A pipeline could be any of these.A pipeline represents a collection of data being prepared for transformation.13. How does Bigtable managed the storage of tablets?Tablets are stored on cluster nodes, but storage can dynamically grow as part of the managed serviceTablets are stored on cluster nodes, which must be sized accordingly for their storage needsTablets are stored in Google Colossus, but a cluster node has a limit on how much storage it can process14. If Cloud Spanner has to break one of the 3 features of the CAP Theorum, which one will it be?Cloud Spanner never breaks any of these featuresConsistencyAvailability15. You are required to load a large volume of data into BigQuery that does contain some duplication. What should you do to ensure the best query performance once the data is loaded?Denormalize the data by using nested or repeated fields.Normalize the data to eliminate duplicates from being stored, and reformat the data as JSON.Normalize the data to eliminate duplicates from being stored, and reformat the data as Avro.16. You are about to deploy the new version of your application, but you are concerned that it may need to be rolled-back and you want to preserve any Pub/Sub messages it may process in the meantime. What do you do?Create an extra subscription to the Topic to store a back up of all messages.Create a test Topic and roll out application changes against that Topic.Create a snapshot on the Subscription so that you can seek back to this point in time if you need to roll-back the application changes.17. What is the maximum total size permitted for a Publish request, including metadata and payload?1MB10MB50MB18. You have multiple systems that all need to be notified of orders being processed. How should you configure Pub/Sub?Create a new Topic for each individual order. Create multiple Subscriptions for each Topic, one for every system that needs to be notified.Create a Topic for orders. Create multiple Subscriptions for this Topic, one for every system that needs to be notified.Create a new Topic for each individual order. Create a Subscription for each Topic that can be shared by every system that needs to be notified.19. What steps do you need to take to set up BigQuery before use?You must create processing nodes in Compute Engine.BigQuery is a serverless product and all compute and storage resources are managed for you.You must create storage buckets for BigQuery to use.20. A customer needs a JSON document store NoSQL database to help them develop a new application. Which GCP product should they choose?Cloud SQLCloud FirestoreCloud Bigtable21. When do BigQuery jobs run?Jobs run for all BigQuery actions including loading, exporting, querying or copying data.Jobs run for all scheduled actions, but not for interactive queries in the BigQuery UI.Jobs run for most actions, including those in the BigQuery UI but only if they are submitted with DML.22. Which features are not compatible with Dataproc autoscaling?Which features are not compatible with Dataproc autoscaling? (Choose 3) Spark Structured Streaming Preemptible workers High-availability clusters MapReduce tasks HDFS Storage23. What is the maximum number of clusters per Bigtable instance?43There is no limit, providing you are within the Compute Engine quotas of your GCP project24. Which transformation can be used to process collections of key/value pairs, in a similar fashion to the shuffle phase of a map/shuffle/reduce-style algorithm?GroupByKeyPartitionCoGroupByKey25. Pub/Sub is designed for what main purpose?To decouple services and distribute workloadsTo enable asynchronous workflowsTo receive, buffer and distribute eventsAll of the above26. Which primary Apache services does Dataproc run?Which primary Apache services does Dataproc run? (Choose 2) Dataflow Cassandra Spark Hadoop27. Cloud Memorystore is essentially a managed service based on which open-source project?MongoDBRedisMemcached28. What is the maximum retention duration for a message in Pub/Sub?30 days7 daysNo maximum retention duration29. What is the name given to a dataset that can be acted upon within a Cloud Dataflow pipeline?BucketPCollectionAggregation30. Your application requires access to a Cloud Storage bucket. What is the best way to achieve this?Create a custom service account with only the required permissions for the applicationMake the application prompt a user for their Google credentials to authenticate with Cloud StorageInclude a user's Google credentials in the application code so it can authenticate with Cloud Storage31 out of 30Please fill in the comment box below. Email Author: CloudVikas