Question 1

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

Accepted Answer

A)   Add capacity (memory and disk space)  to the database server by the order of 200.
B)   Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.
C)   Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.
D)   Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.E) C) and D)
F) B) and C)

Question 2

You are updating the code for a subscriber to a Pub/Sub feed. You are concerned that upon deployment the subscriber may erroneously acknowledge messages, leading to message loss. Your subscriber is not set up to retain acknowledged messages. What should you do to ensure that you can recover from errors after deployment?

Accepted Answer

A)   Set up the Pub/Sub emulator on your local machine. Validate the behavior of your new subscriber logic before deploying it to production.
B)   Create a Pub/Sub snapshot before deploying new subscriber code. Use a Seek operation to re-deliver messages that became available after the snapshot was created.
C)   Use Cloud Build for your deployment. If an error occurs after deployment, use a Seek operation to locate a timestamp logged by Cloud Build at the start of the deployment.
D)   Enable dead-lettering on the Pub/Sub topic to capture messages that aren't successfully acknowledged. If an error occurs after deployment, re-deliver any messages captured by the dead-letter queue.E) B) and D)
F) C) and D)

Question 3

You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They should only see certain tables based on their team membership. How should you set user permissions?

Accepted Answer

A)   Assign the users/groups data viewer access at the table level for each table
B)   Create SQL views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the SQL views
C)   Create authorized views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the authorized views
D)   Create authorized views for each team in datasets created for each team. Assign the authorized views data viewer access to the dataset in which the data resides. Assign the users/groups data viewer access to the datasets in which the authorized views resideE) A) and D)
F) B) and C)

Question 4

Your company is in the process of migrating its on-premises data warehousing solutions to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC)  to apply updates from multiple transactional database sources on a daily basis. With BigQuery, your company hopes to improve its handling of CDC so that changes to the source systems are available to query in BigQuery in near-real time using log-based CDC streams, while also optimizing for the performance of applying changes to the data warehouse. Which two steps should they take to ensure that changes are available in the BigQuery reporting table with minimal latency while reducing compute overhead? (Choose two.)

Accepted Answer

A)   Perform a DML INSERT, UPDATE, or DELETE to replicate each individual CDC record in real time directly on the reporting table.
B)   Insert each new CDC record and corresponding operation type to a staging table in real time.
C)   Periodically DELETE outdated records from the reporting table.
D)   Periodically use a DML MERGE to perform several DML INSERT, UPDATE, and DELETE operations at the same time on the reporting table.
E)   Insert each new CDC record and corresponding operation type in real time to the reporting table, and use a materialized view to expose only the newest version of each unique record.F) C) and E)
G) All of the above

Question 5

You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM) . Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?

Accepted Answer

A)   Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory
B)   Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS
C)   Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up
D)   Allocate additional network interface card (NI
C)  , and configure link aggregation in the operating system to use the combined throughput when working with Cloud StorageE) All of the above
F) A) and B)

Question 6

You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

Accepted Answer

A)   Use Cloud ML Engine for training existing Spark ML models
B)   Rewrite your models on TensorFlow, and start using Cloud ML Engine
C)   Use Cloud Dataproc for training existing Spark ML models, but start reading data directly from BigQuery
D)   Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQueryE) A) and B)
F) All of the above

Question 7

To give a user read permission for only the first three columns of a table, which access control method would you use?

Accepted Answer

A)   Primitive role
B)   Predefined role
C)   Authorized view
D)   It's not possible to give access to only the first three columns of a table.E) B) and D)
F) C) and D)

Question 8

You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the "Trust No One" (TNO)  approach to encrypt your data to prevent the cloud provider staff from decrypting your data. What should you do?

Accepted Answer

A)   Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AA
D)  . Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud. Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AA
D)  . Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud.
B)   Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket. Manually destroy the key previously used for encryption, and rotate the key once. Use gcloud kms keys create to create a symmetric key. Then use to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket. Manually destroy the key previously used for encryption, and rotate the key once.
C)   Specify customer-supplied encryption key (CSEK)  in the . boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret. Specify customer-supplied encryption key (CSEK)  in the . boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret.
D)   Specify customer-supplied encryption key (CSEK)  in the . boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access. to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access.E) B) and D)
F) A) and B)

Question 9

When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?

Accepted Answer

A)   500 TB
B)   1 GB
C)   1 TB
D)   500 GBE) A) and C)
F) A) and B)

Question 10

Cloud Bigtable is Google's ______ Big Data database service.

Accepted Answer

A)   Relational
B)   mySQL
C)   NoSQL
D)   SQL ServerE) A) and B)
F) B) and C)

Question 11

Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

Accepted Answer

A)   Put the data into Google Cloud Storage.
B)   Use preemptible virtual machines (VMs)  for the Cloud Dataproc cluster.
C)   Tune the Cloud Dataproc cluster so that there is just enough disk for all data.
D)   Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.E) A) and B)
F) A) and C)

Question 12

Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

Accepted Answer

A)   Check the dashboard application to see if it is not displaying correctly.
B)   Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.
C)   Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.
D)   Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.E) A) and B)
F) A) and C)

Question 13

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

Accepted Answer

A)   Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.
B)   Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.
C)   Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.
D)   Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.E) B) and C)
F) A) and D)

Question 14

An online retailer has built their current application on Google App Engine. A new initiative at the company mandates that they extend their application to allow their customers to transact directly via the application. They need to manage their shopping transactions and analyze combined data from multiple datasets using a business intelligence (BI)  tool. They want to use only a single database for this purpose. Which Google Cloud database should they choose?

Accepted Answer

A)   BigQuery
B)   Cloud SQL
C)   Cloud BigTable
D)   Cloud DatastoreE) A) and C)
F) B) and C)

Question 15

You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?

Accepted Answer

A)   Cloud Speech-to-Text API
B)   Cloud Natural Language API
C)   Dialogflow Enterprise Edition
D)   Cloud AutoML Natural LanguageE) B) and C)
F) B) and D)

Question 16

Google Cloud Bigtable indexes a single value in each row. This value is called the _______.

Accepted Answer

A)   primary key
B)   unique key
C)   row key
D)   master keyE) A) and C)
F) C) and D)

Question 17

You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?

Accepted Answer

A)   Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.
B)   Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources. Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.
C)   Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.
D)   Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs. Configure the job to use non-default Compute Engine machine types when needed.E) A) and B)
F) C) and D)

Question 18

You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies on each other. You want to use managed services where possible, and the pipeline will run every day. Which tool should you use?

Accepted Answer

A)   cron
B)   Cloud Composer
C)   Cloud Scheduler
D)   Workflow Templates on Cloud DataprocE) A) and B)
F) None of the above

Question 19

Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you do?

Accepted Answer

A)   Create a Stackdriver Monitoring dashboard based on the BigQuery metric query/scanned_bytes Create a Stackdriver Monitoring dashboard based on the BigQuery metric query/scanned_bytes
B)   Create a Stackdriver Monitoring dashboard based on the BigQuery metric slots/allocated_for_project slots/allocated_for_project
C)   Create a log export for each project, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs , and create a Stackdriver Monitoring dashboard based on the custom metric Create a log export for each project, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs , and create a Stackdriver Monitoring dashboard based on the custom metric
D)   Create an aggregated log export at the organization level, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs , and create a Stackdriver Monitoring dashboard based on the custom metric Create an aggregated log export at the organization level, capture the BigQuery job execution logs, create a custom metric based on theE) A) and B)
F) A) and C)

Question 20

You work for a manufacturing company that sources up to 750 different components, each from a different supplier. You've collected a labeled dataset that has on average 1000 examples for each unique component. Your team wants to implement an app to help warehouse workers recognize incoming components based on a photo of the component. You want to implement the first working version of this app (as Proof-Of-Concept)  within a few working days. What should you do?

Accepted Answer

A)   Use Cloud Vision AutoML with the existing dataset.
B)   Use Cloud Vision AutoML, but reduce your dataset twice.
C)   Use Cloud Vision API by providing custom labels as recognition hints.
D)   Train your own image recognition model leveraging transfer learning techniques.E) C) and D)
F) A) and C)

Exam 18: Professional Data Engineer on Google Cloud Platform

Google Cloud Bigtable indexes a single value in each row. This value is called the _______.

Correct Answer
verified

To give a user read permission for only the first three columns of a table, which access control method would you use?

Correct Answer
verified

Correct Answer
verified

You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They should only see certain tables based on their team membership. How should you set user permissions?

Correct Answer
verified

Correct Answer
verified

When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the "Trust No One" (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your data. What should you do?

Correct Answer
verified

Correct Answer
verified

Cloud Bigtable is Google's ______ Big Data database service.

Correct Answer
verified

Correct Answer
verified

Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you do?

Correct Answer
verified

Correct Answer
verified

Exam 18: Professional Data Engineer on Google Cloud Platform

Google Cloud Bigtable indexes a single value in each row. This value is called the _______.

Correct AnswerverifiedShow Answer

To give a user read permission for only the first three columns of a table, which access control method would you use?

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They should only see certain tables based on their team membership. How should you set user permissions?

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the "Trust No One" (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your data. What should you do?

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Cloud Bigtable is Google's ______ Big Data database service.

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you do?

Correct AnswerverifiedShow Answer

Correct AnswerverifiedShow Answer

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified

Correct Answer
verified