Databricks_Data_Engineer_1

COMENTARIOS

ESTADÍSTICAS

RÉCORDS

REALIZAR TEST

Título del Test:

Databricks_Data_Engineer_1

Descripción:
Test Databricks Data_Engineer_1

Autor:

David Torres López

OTROS TESTS DEL AUTOR

Fecha de Creación: 2022/11/02

Categoría: Informática

Número Preguntas: 45

Valoración:

(1)

COMPARTE EL TEST

Nuevo Comentario

Comentarios
NO HAY REGISTROS

Temario:

A data engineer has joined a team which is using Databricks notebooks with Git integration. The data engineer needs to clone the Git repository in Databricks to start collaborating with the team. Which of the following tabs from the left menu bar should the data engineer select to clone the remote repository?. Repos. Jobs. Data. Git. VCS.

Which of the following is TRUE about a Job cluster in Databricks?. A Job Cluster can be created using the UI, CLI or REST API. Multiple users can share a Job cluster. Job clusters can be restarted as per need. The Job cluster terminates when the Job ends. Job cluster works only with Python Language notebooks.

Which of the following is NOT one of the magic commands that can be used in a Databricks notebook?. %sql. %java. %python. %r. %scala.

As a data engineer, you have seen that a large number of files for employees table is taking a lot of memory. These files are nothing but the versioned history of the employees table. You need to remove the files from the system and keep only the files that are maximum 2 days old. Your colleague has written the following query. VACUUM employees RETAIN 2 DAYS What should be corrected to run this query?. RESTORE Command should be used instead of VACUUM. RETAIN FOR 2 DAYS should be used. You cannot delete the old files for just one table, you need to add database name as VACUUM isa database level operation. VACUUM accepts value in HOURS and not DAYS, 2 DAYS should be replaced with 48 HOURS. DRY RUN Should be used at the end of the SQL statement.

You are working as a data engineer in XYZ company. A delta table department already exists but now you need to change the schema. Another data engineer has written a SQL statement to create the table with the latest Databricks Runtime, but it is not working as desired. What can be the error in the following SQL statement? CREATE OR REPLACE department (roll_no int, name string);. USING DELTA should be added at the end of SQL statement. TABLE keyword is missing before department. The brackets around schema should be removed. CREATE TABLE IF NOT EXISTS should be used instead of CREATE OR REPLACE. The department table should be dropped first using DROP Command as you cannot overwrite a Delta table.

Which of the following is NOT true about a Data Lake?. Data Lake can store structured, unstructured, and semi-structured data. Data in a Data Lake is stored in its original format. Defining schema on load is necessary in a Data Lake. Data Lake can be used to store real time data. Databricks Lakehouse combines the features of Data Lake and Data Warehouse.

Your fellow data engineer is using a Databricks notebook which is defaulted to Python language. They need to have an interactive view of the data on which they can plot a graph. They try to run the following query on an aggregated Gold table avg_scores, but they are not able to see the output data. spark.sql("SELECT * FROM avg_scores") What is the reason that they are not able to view the data?. Databricks does not support querying SQL table in a Python cell. As it is a Python cell, show() operation should be applied to get the contents. The cell's language should be changed to SQL and the cell should be executed again. The spark.sql() function should be passed as an argument to the display () function to view the data. The argument passed to the spark. sql() function should not be enclosed in quotes.

Which of the following is false about an external table?. An external table is also called as unmanaged table. When an external table is dropped, only the metadata is deleted from the system and the data remains intact. External table never specify LOCATION while creating the table. Registering an external table to a different database is easy as no data movement is required. Databricks manages only metadata for an external table.

The following statement intends to select all the records from version 6 of table testing_logs. Which of the following should replace the blank to achieve the task? SELECT * FROM testing_logs __________. VERSION 6. ROLLBACK TO 6. VERSION AS OF 6. ROLLBACK AS OF 6. VERSION 6.

Which of the following commands fails to return the metadata of flights table?. DESC EXTENDED flights. DESCRIBE DETAIL flights. DESC flights. DESCRIBE HISTORY flights. DESC PRIVACY flights.

Which of the following magic commands can be used to run a notebook from another notebook?. %run_notebook. %run. %runNotebook. %start. %start_notebook.

A junior data engineer from your team wants to insert 5 records in the employees table. They have come up with the following set of SQL queries. INSERT INTO employees VALUES (234, ‘Erich Heard’); INSERT INTO employees VALUES (209, ‘Paul Fosbury’); INSERT INTO employees VALUES (141, ‘Ricky Matt’); INSERT INTO employees VALUES (940, ‘Jeff Sims’); INSERT INTO employees VALUES (744, ‘Chriss Holmes’); Each of the statements is processed as a separate transaction and you need to modify the statement to be able to insert all 5 records in one go. Which of the following SQL statements can be used to insert these 5 records in a single transaction?. INSERT INTO TABLE employees VALUES (234, Erich Heard"), (209, Paul Fosbury'), (141, Ricky Matt'), (940, Jeff Sims'), (744, Chriss Holmes' ) ;. INSERT INTO employees MULTIPLE VALUES (234, "Erich Heard), (209, Paul Fosbury'), (141, Ricky Matt'), (940, Jeff Sims'), (744, Chriss Holmes');. INSERT INTO employees "5 VALUES (234, Erich Heard'), (209, Paul Fosbury"), (141, Ricky Matt'), (940, eff Sims), (744, Chriss Holmes');. INSERT INTO employees VALUES (234, Erich Heard"), (209, Paul Fosbury'), (141, "Ricky Matt'), (940, Jeff Sims), (744, Chriss Holmes' ) ;. INSERT INTO employees (234, Erich Heard'), (209, Paul Fosbury'), (141, Ricky Matt'), (940,Jeff Sims"), (744, Chriss Holmes');.

You have created a new managed table members in the company database using the following set of SQL statements: CREATE DATABASE IF NOT EXISTS company; USE company; CREATE OR REPLACE TABLE members(id int, name string); What will be the location of the newly created table?. dbfs:/user/warehouse/company.db/. dbfs:/hive/warehouse/ company/. dbfs:/user/hive/company/. dbfs:/user/hive/warehouse/ company.db/. dbfs:/user/lakehouse/company.db/.

Which of the following statements defines a Python function get_column() which accepts a column name and prints the values in that column from payments table?. def get_column(column_name): spark.sql(f"SELECT {column_name} from payments"). define get_column(column_name): display(spark. sql(f"SELECT {column_name} from payments")). function get_column (column_name): spark.sql (""SELECT {columnname) from payments"). show(). def get_column ( column_name): spark.sql(f"SELECT {column_name} from payments") . show(). def get_column (column_name) spark.sql("SELECT column_name from payments").

You, as a data engineer, want to use a SQL query in a Python function. What approach can you follow?. spark.sql() function should be used to run the SQL query. Change the cell's language to SQL as the SQL cell allows the usage of Python code as well. SQL query cannot be accessed inside Python code. Pyspark.sq1() function should be used to run the SQL query. Install Spark sQL driver to run the query.

A data analyst has created an empty delta table named passengers. The data analyst needs to make sure that the age of the new passengers should be less than 60. Which of the following statements will ensure that all the incoming records have age column’s value less than 60?. ALTER TABLE passengers ADD check_age CHECK (age < 60). ALTER TABLE passengers ADD CONSTRAINT check_age (age 60). ALTER passengers ADD CONSTRAINT check_age CHECK age 60. ALTER TABLE passengers ADD cONSTRAINT check_age CHECK (age <60). ALTER passengers ADD CONSTRAINT check_age CHECK (age < 60).

A data engineer has created a Global Temp View new_trains. Which of the following SQL statements will show the contents of new_trains view?. SELECT* FROM new_trains. SELECT * FROM global.new_trains. SELECT FROM global_temp.new_trains. SELECT * FROM temp_global. new_trains. SELECT FROM temp.new_trains.

The following SQL statement intends to delete all the records from the employees table that contains string ‘Ex’ in employee_code column. Find the error in the statement. DELETE VALUES from employees WHERE employee_code like ‘Ex%’. Wildcardslike cannot be used in WHERE clause. IF should be used instead of WHERE. TABLE keyword should be used after table name i.e. employees. VALUES should not be used in DELETE Statements. ALL VALUES should be used instead of VALUES in DELETE Statements.

Which of the following Spark SQL functions can be used to convert an array column in a Delta table into multiple rows, with each row containing individual elements of the array?. SELECT. FILTER. TRANSFORM. EXPLODE. EXPLODEARRAY.

A data analyst needs to replace the contents of table tax_details with the contents of table new_tax_details Both the tables have identical schema but table new_tax_details contain an extra column named middle_name The data analyst has written the following query to execute the overwrite: INSERT OVERWRITE tax_details SELECT * FROM new_tax_details; What will be the outcome of the above query?. The data will be overwritten without any error. The data will be overwritten with a schema mismatch warning message. The query will fail with a schema mismatch error. The query will fail as INSERT OVERWRITE is not a valid command. The query will be executed without any errors or warnings but the data will not be overwritten.

Which of the following is not a valid operator which works on two or more tables?. UNION. INTERSECT. MINUS. PLUS. EXCEPT.

Which of the following describes the advantage of using higher-order functions in Spark SQL?. The higher-order functions increase the number of clusters for running Spark SQL in Databricks. The higher-order functions do not exist in Spark SQL. The higher-order functions help in directly working with complex data types. The higher-order functions can be used to combine two tables using the UNION Operator. Higher-order functions can be used to speed up the ORDER BY query.

A data engineer is working on SQL UDF which adds two columns salary and bonus to view the total salary of all the employees. The data engineer has defined the following SQL UDF which intends to perform the required task but the last line has been deleted by mistake. What should come in the last line of the function definition? CREATE FUNCTION total_salary(salary INT, bonus INT) RETURNS INT. CONCAT(Salary, bonus). RETURN CONCAT ( salary, bonus). salary + bonus. RETURN salary + bonus. CONCAT (salary bonus).

Which of the following statements precisely describe the difference between INSERT INTO and MERGE INTO for Delta tables in Databricks?. INSERT INTO can be used to insert and update to a table whereas MERGE INTO can be used for insert, update and delete. MERGE INTO can be used to insert and update to a table whereas INSERT INTO can be used for insert and delete. MERGE INTo can be used to insert, update and delete from/to a table whereas INSERT INTO Can be used only to insert values to a (Correcto) table. MERGE INTO can be used to insert and delete from/to a table whereas INSERT INTO can be used to insert, update and delete. Both MERGE INTO and INSERT INTO can be used to insert, update and delete from/to a table.

Which of the following about the Multi-hop architecture is true?. Multi-hop architecture can be used only for batch workloads. In Multi-hop architecture, the main task of the Bronze table is to apply the schema to the raw data. Multi-hop architecture can only be performed in SQL. For multi-hop architecture to perform quickly, SQL endpoints are necessary. Most of the multi-hop architectures include Gold-Diamond-Platinum tables.

A data engineer is using AutoLoader for ingesting CSV data from S3 location. What should replace the blank to execute the code correctly. dataDF = spark.readstream._____________ .option("cloudFiles.format", "csv") .option("cloudFiles.schemaLocation", schemaLocation) .load(source). autoloader. format("autoLoader"). format("cloud Files"). option(autoLoader"). option("cloudFiles").

Which of the following users can use the Gold table as a source?. A user that needs to feed the raw data to a table. A user that needs to design a dashboard using aggregated data. A user that needs to add a column to the table. The Gold table is the end of multi-hop architecture and is not used by any user. A user that needs to join static data with streaming data to make the data richer.

A data analyst has created a Bronze table containing 20 million records by ingesting raw data. A data engineer wants to test the data by running a query over the table. The table, being a part of a production environment, cannot be used by the data engineer directly. The creation of views on the original table has also been restricted. Which of the following approaches can you suggest for the data engineer to quickly test the data?. The data engineer can create DEEP CLONE of the table and run the query over the newly created table. The data engineer can request the data analyst to query over the original table. The data engineer can create SHALLOW CLONE of the table and run the query over the newly created table. The data engineer can request the admin to run the query. The data engineer can use Python's spark. sql() function to query the data.

A team of data analysts is using CTE (Common Table Expression) as part of their SQL queries to be used for one of the Bronze tables in multi-hop architecture. All of the data analysts try to explain the usage of CTE to a newly joined member. Which of the following statements about CTE is/are correct? Data Analyst 1 – CTE can be created using WITH command Data Analyst 2 – CTE allows a result to be used multiple times Data Analyst 3 – CTE cannot be nested Data Analyst 4 – CTE can be created using CTE command. Data analysts 1, 2, 3 are right but data analyst 4 Is wrong. All the data analysts are right. Data analysts 2 and 3 are right but the data analysts 1 and 4 are wrong. Data analysts 1 and 2 are right but the data analysts 3 and 4 are wrong. All the data analysts are wrong except data analyst 1.

A data engineer is working on a project which involves writing a multi-hop architecture using Python. A data analyst also needs to work in the same project but knows only SQL. Which of the following can be done by the data engineer to assure both of them work in their own layers of the multi-hop architecture?. Use python in the cell and run SQL queries. Change the default language of the notebook to SQL. Register a UDF. Contact Databricks Administrator to change the language. Create a temporary view using createorReplaceTempView() function.

Which of the following SQL statements counts the number of unique rows from the Silver table routes?. SELECT count (*) FROM routes;. SELECT count (DISTINCT *) FROM routes;. SELECT count_if(* is DISTINCT) FROM routes. SELECT count(*) FROM routes WHERE* is DISTINCT;. SELECT count (UNIQUE (*)) FROM routes;.

In which of the following layers of multi-hop architecture the most common operation is aggregation?. Gold. Silver. Bronze. Raw. Diamond.

Which of the following can be a source for the Bronze table in Incremental multi-hop architecture? 1.Kafka Stream 2.Silver table 3.Gold Table 4.JSON data 5.Raw data. 1,2, 3, 4. 1, 2, 3. 1, 3, 4, 5. 1,4,5. 2, 3, 5.

A data engineer is using AutoLoader for a streaming ETL process and a new file has arrived with a different schema in the source directory. This newly added file has an extra column named last_name. What could be the possible outcome when the AutoLoader processes this file?. The AutoLoader process will fail with an error. The Autoloader will discard that column with no reference to that column. The process will stop and the user can choose the next step. The process will continue with the details of added column last_name stored in_rescued_data column. The new file will be auto deleted and will not be processed.

A junior data engineer has joined a team and needs to create a DLT pipeline. Which of the following describes the flow of actions to create a new pipeline? 1.Select the Jobs pane from the left side menu 2.Select Run Pipeline 3.Select Create Pipeline 4.Select Compute pane from the left side menu 5.Click on Create DLT Pipeline 6.Click on Delta Live Tables tab 7.Drag the DLT pane from left side menu to the notebook cell. 1->5->3. 7 ->6-> 2. 1->6->3. 7->6-3. 4->7-> 2.

A junior data engineer has joined a team working on a project involving creation of Delta Live Table pipelines using SQL. They see the following constraint added to the table: CONSTRAINT age_in_range EXPECT (age > 0 AND age < 60) ON VIOLATION DROP ROW What will be the effect of this statement on the working of the DLT pipeline?. Every time the value of age column is between 0 and 60, the pipeline will fail. Every time the value of age column is between 0 and 60, the row will be dropped. Every time the value of age column is not between 0 and 60, the pipeline will fail. Every time the value of age column is not between 0 and 60, the row will be dropped. The statement has no effect.

Which of the following can be combined with a DLT pipeline?. Medallion architecture Bronze, Silver and Gold tables. AutoLoader. Constraints on tables. All of these. None of these.

A team of data analysts is running queries on a SQL endpoint and the queries are running at a decent speed. Now, the number of users has increased massively from 1 to 20 and because of that the queries have been running very slow. The cluster size has already been set to the maximum but still the queries are on a slower side. What can you suggest to make the queries run faster for all the users?. Request Databricks Administrator to increase the speed. Turn on the Auto-stop feature. Increase the Max bound of cluster scaling range. Turn on the Serverless feature. Decrease the cluster size and increase the Max bound of cluster scaling range.

Which of the following statements is true for cluster pools in Databricks?. By using cluster pools, you can reduce start time for a cluster. It maintains several clusters in idle state and can be used when necessary. By using cluster pools, you can reduce auto-scaling time for a cluster. Same cluster pool can be used for a driver node and worker nodes. All the above statements are true for a cluster pool.

A scheduled query is running every 5 seconds to ingest data from a networking system which contains various network related attributes. The network engineer should be informed through email if the value in the fault column increases to 10 or more. Which of the following is the best approach?. A manual email can be sent by the data engineer if the value increases the threshold. The Databricks administrator can send the email to the network engineer. Databricks Alerts can be used to notify the network engineer through email. Use the dashboard to check the value of the fault column. The alerting system through email is not yet supported in Databricks.

Which of the following is optional while creating a Delta Live Table pipeline?. Target Database. Pipeline name. Notebook Library. Minimum and maximum workers. Pipeline mode.

Which of the following has been built to provide fine-grained data governance and security in the Lakehouse?. AutoLoader. Unity Catalog. SQL endpoint. Cluster. Data Explorer.

Which of the following queries can be used to REVOKE all the permissions from user bob@candes.db on database courses?. REVOKE ALL PRIVILEGES ON courses DB FROM bob@candes .db. REVOKE ALL PERMISSIONS ON DATABASE COurses FROM bob@candes .db. REVOKE ALL PRIVILEGES FROM courses SCHEMA TO bob@candes . db. REVOKE ALL PRIVILEGES ON courses DATABASE FROM USER bob@candes.db. REVOKE ALL PRIVILEGES ON DATABASE courses FROM bob@candes .db.

A data engineer is the owner of the organization database. Which of the following permissions cannot be controlled by the Databricks administrator?. Grant permissions to users to the tables in organization database. Revoke permissions from users for accessing the organization database. View all the grants on organization database. Revoke permissions from the owner of the organization database. Grant permission to other users to the organization database.

A Databricks administrator needs to view all the grants to the user abc@def.com for database university. Which of the following commands can be used?. SHOW GRANTS abc@def.com ON university. SHOW ALL GRANTS TO abc@def.com ON DATABASE university. SHOW GRANTS abc@def.com ON DATABASE university. VIEW GRANTS TO abc@def.com ON DATABASE university. SHOW ALL GRANTS TO abc@def.com university.

Denunciar Test

▲