Databricks Certified Data Engineer Associate Preparation 2

COMENTARIOS

ESTADÍSTICAS

RÉCORDS

REALIZAR TEST

Título del Test:

Databricks Certified Data Engineer Associate Preparation 2

Descripción:
Databricks Certified Data Engineer Associate Preparation 2

Autor:
Arturo

OTROS TESTS DEL AUTOR

Fecha de Creación: 2024/05/31

Categoría: Informática

Número Preguntas: 26

Valoración:

(1)

COMPARTE EL TEST

Nuevo Comentario

Comentarios
NO HAY REGISTROS

Temario:

Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?. The ability to manipulate the same data using a variety of languages. The ability to collaborate in real time on a single notebook. The ability to set up alerts for query failures. The ability to support batch and streaming workloads. The ability to distribute complex data operations.

Which of the following is hosted completely in the classic Databricks architecture?. Worker node. JDBC data source. Databricks web application. Databricks Filesystem. Driver node.

Which of the following describes a scenario in which a data team Will want to utilize cluster pools?. An automated report needs to be refreshed as quickly as possible. An automated report needs to be made reproducible. An automated report needs to be tested to identify errors. An automated report needs to be version-controlled across multiple collaborators. .An automated report needs to be runnable by all stakeholders.

A data organization leader is upset about the data analysis team's reports being different from the data engineering team's reports. The leader believes the siloed nature of their organization's data engineering and data analysis architectures is to blame. Which of the following describes how a data lakehouse could alleviate this issue?. Both teams would autoscale their work as data size evolves. Both teams would use the same source of truth for their work. Both teams would reorganize to report to the same department. Both teams would be able to collaborate on projects in real-time. Both teams would respond more quickly to ad-hoc requests.

A new data engineering team team has been assigned to an ELT project. The new data engineering team Will need full privileges on the table sales to fully manage the project. Which of the following commands can be used to grant full permissions on the database to the new data engineering team?. GRANT ALL PRIVILEGES ON TABLE sales TO team;. GRANT SELECT CREATE MODIFY ON TABLE sales TO team;. GRANT SELECT ON TABLE sales TO team;. GRANT USAGE ON TABLE sales TO team;. GRANT ALL PRIVILEGES ON TABLE team TO sales;.

A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have been made and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository. Which of the following Git operations does the data engineer need to run to accomplish this task?. Merge. Push. Pull. Commit. Clone.

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions. In which of the following locations can the data engineer review their permissions on the table?. Databricks Filesystem. Jobs. Dashboards. Repos. Data Explorer.

A data engineer has been given a new record of data: id STRING = jal' rank INTEGER 6 rating FLOAT = 9.4 Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?. INSERT INTO my_table VALUES (tal', 6, 9.4). my_table UNION VALUES ('al', 6, 9.4). INSERT VALUES ( 'ali , 6, 9.4) INTO my_table. UPDATE my_table VALUES ('al', 6, 9.4). UPDATE VALUES ('al', 6, 9.4) my_table.

A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance. Which of the following keywords can be used to compact the small files?. REDUCE. OPTIMIZE. COMPACTION. REPARTITION. VACUUM.

In which of the following file formats is data from Delta Lake tables primarily stored?. Delta. CSV. Parquet. JSON. A proprietary, optimized format specific to Databricks.

Which of the following is stored in the Databricks customer's Cloud account?. Databricks web application. Cluster management metadata. Repos. Data. Notebooks.

Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?. None of theseData. Data Lake. Data warehouse. All of these. Data lakehouse.

# A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They Still want all of the Other cells to use Python without making any changes to those cells. Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?. It is not possible to use SQL in a Python notebook. They can attach the cell to a SQL endpoint rather than a Databricks cluster. They can simply write SQL syntax in the cell. They can add %sql to the first line of the cell. They can change the default language of the notebook to SQL.

Which of the following SQL keywords can be used to convert a table from a long format to a Wide format?. TRANSFORM. PIVOT. SUM. CONVERT. WHERE.

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?. Parquet files can be partitioned. CREATE TABLE AS SELECT statements cannot be used on files. Parquet files have a well-defined scherna. Parquet files have the ability to be optimized. Parquet files Will become Delta tables.

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by Other data engineers in Other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data. Which of the following relational objects should the data engineer create?. Spark SQL Table. View. Database. Temporary View. Delta Table.

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL. Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?. .SELECT * FROM sales. spark.delta.table. spark.sql. There is no way to share data between PySpark and SQL. spark.table.

Which of the following commands Will return the number of null values in the member_id column?. SELECT count(member_id) FROM my_table;. SELECT count(member_id) - count_null(member_id) FROM my_table;. SELECT count_if(member_id IS NULL) FROM my_table;. SELECT null(member_id) FROM my_table;. SELECT count_null(member_id) FROM my_table;.

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FIL TER higher-order function. Which of the following code blocks successfully completes this task?. SELECT Store id, employees, FILTER (employees, i -> i.years_exp > 5) as exp_employees FROM stores;. SELECT Store id, employees, FILTER (exp_employees, years_exp > 5) as exp_employees FROM stores;. SELECT Store id, employees, FILTER (employees, years_exp > 5) as exp_employees FROM stores;. SELECT store id, employees, CASE WHEN employees.years_exp > 5 THEN employees ELSE NULL END AS exp employees FROM stores;. SELECT store id, employees, FILTER (exp_employees, i -> i . years_exp > S) AS exp employees FROM stores;.

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that Will run the query using table_name. They have the following incomplete code block: _________________(f"SELECT customer-id, spend FROM {table_name}") Which of the following can be used to fill in the blank to successfully complete the task?. spark.delta.sql. spark.delta.table. spark.table. dbutils.sql. spark.sql.

A data engineer has created a new database using the following command: CREATE DATABASE IF NOT EXISTS customer360; In which of the following locations Will the customer360 database be located?. dbfs:/user/hive/database/customer360. dbfs:/user/hive/warehouse. dbfs:/user/hive/customer360. More information is needed to determine the correct response. dbfs:/user/hive/database.

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command: DROP TABLE IF EXISTS my_table; After running this command, the engineer notices that the data files and metadata files have been deleted from the file system. Which of the following describes Why all of these files were deleted?. The table was managed. The table's data was smaller than 10 GB. The table's data was larger than 10 GB. The table was external. The table did not have a location.

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum? Which of the following code blocks can the data engineer use to complete this task?. function add_integers (x, y) : return x + Y. function add_integers (x, y) : x + y. def add_integers (x, y) : print (x + Y). def add_integers (x, y) : return x + Y. def add_integers (x, y) : x + y.

# In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?. When the location of the data needs to be changed. When the target table is an external table. When the source table can be deleted. When the target table cannot contain duplicate records. When the source is not a Delta table.

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The code block used by the data engineer is below: If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?. processingTime(1). trigger(availableNow=True). trigger(parallelBatch=True). trigger(processingTime="once"). trigger(continuous="once").

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values. Which of the following describes Why Auto Loader inferred all of the columns to be of the string type?. There was a type mismatch between the specific schema and the inferred schema. JSON data is a text-based format. Auto Loader only works with string data. All of the fields had at least one null value. Auto Loader cannot infer the schema of ingested data.

Denunciar Test

▲