7) ML Engineer

COMENTARIOS

ESTADÍSTICAS

RÉCORDS

REALIZAR TEST

Título del Test:

7) ML Engineer

Descripción:
7) ML Engineer

Autor:
Vicente

OTROS TESTS DEL AUTOR

Fecha de Creación: 2024/02/14

Categoría: Otros

Número Preguntas: 25

Valoración:

(1)

COMPARTE EL TEST

Nuevo Comentario

Comentarios
NO HAY REGISTROS

Temario:

While running a model training pipeline on Vertex Al, you discover that the evaluation step is failing because of an out-of-memory error. You are currently using TensorFlow Model Analysis (TFMA) with a standard Evaluator TensorFlow Extended (TFX) pipeline component for the evaluation step. You want to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead. What should you do?. A. Include the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow. B. Move the evaluation step out of your pipeline and run it on custom Compute Engine VMs with sufficient memory. C. Migrate your pipeline to Kubeflow hosted on Google Kubernetes Engine, and specify the appropriate node parameters for the evaluation step. D. Add tfma.MetricsSpec () to limit the number of metrics in the evaluation step.

You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?. A. Use sparse representation in the test set. B. Randomly redistribute the data, with 70% for the training set and 30% for the test set. C. Apply one-hot encoding on the categorical variables in the test data. D. Collect more data representing all categories.

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?. A. Modify the target variable using the Box-Cox transformation. B. Z-normalize all the numeric features. C. Oversample the fraudulent transaction 10 times. D. Log transform all numeric features.

You are developing a classification model to support predictions for your company’s various products. The dataset you were given for model development has class imbalance You need to minimize false positives and false negatives What evaluation metric should you use to properly train the model?. A. F1 score. B. Recall. C. Accuracy. D. Precision.

You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?. A. Increase the instance memory to 512 GB, and increase the batch size. B. Replace the NVIDIA P100 GPU with a K80 GPU in the training job. C. Enable early stopping in your Vertex AI Training job. D. Use the tf.distribute.Strategy API and run a distributed training job.

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?. A. Train a TensorFlow model on Vertex AI. B. Train a classification Vertex AutoML model. C. Run a logistic regression job on BigQuery ML. D. Use scikit-learn in Vertex AI Workbench user-managed notebooks with pandas library.

You recently developed a deep learning model. To test your new model, you trained it for a few epochs on a large dataset. You observe that the training and validation losses barely changed during the training run. You want to quickly debug your model. What should you do first?. A. Verify that your model can obtain a low loss on a small subset of the dataset. B. Add handcrafted features to inject your domain knowledge into the model. C. Use the Vertex AI hyperparameter tuning service to identify a better learning rate. D. Use hardware accelerators and train your model for more epochs.

You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your model to scale smoothly and require minimal development work. What should you do?. A. Develop a custom TensorFlow regression model, and optimize it using Vertex AI Training. B. Develop a regression model using BigQuery ML. C. Develop a custom scikit-learn regression model, and optimize it using Vertex AI Training. D. Develop a custom PyTorch regression model, and optimize it using Vertex AI Training.

Your organization manages an online message board. A few months ago, you discovered an increase in toxic language and bullying on the message board. You deployed an automated text classifier that flags certain comments as toxic or harmful. Now some users are reporting that benign comments referencing their religion are being misclassified as abusive. Upon further inspection, you find that your classifier's false positive rate is higher for comments that reference certain underrepresented religious groups. Your team has a limited budget and is already overextended. What should you do?. A. Add synthetic training data where those phrases are used in non-toxic ways. B. Remove the model and replace it with human moderation. C. Replace your model with a different text classifier. D. Raise the threshold for comments to be considered toxic or harmful.

You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to Vertex AI. You need to determine which customer attribute has the most predictive power for each prediction served by the model. What should you do?. A. Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each feature and the target variable. B. Use Vertex Explainable AI. Submit each prediction request with the explain' keyword to retrieve feature attributions using the sampled Shapley method. C. Use Vertex AI Workbench user-managed notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong signal. D. Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature importance in order of those that caused the most significant performance drop when removed from the model.

You are an ML engineer at a manufacturing company. You are creating a classification model for a predictive maintenance use case. You need to predict whether a crucial machine will fail in the next three days so that the repair crew has enough time to fix the machine before it breaks. Regular maintenance of the machine is relatively inexpensive, but a failure would be very costly. You have trained several binary classifiers to predict whether the machine will fail, where a prediction of 1 means that the ML model predicts a failure. You are now evaluating each model on an evaluation dataset. You want to choose a model that prioritizes detection while ensuring that more than 50% of the maintenance jobs triggered by your model address an imminent machine failure. Which model should you choose?. A. The model with the highest area under the receiver operating characteristic curve (AUC ROC) and precision greater than 0.5. B. The model with the lowest root mean squared error (RMSE) and recall greater than 0.5. C. The model with the highest recall where precision is greater than 0.5. D. The model with the highest precision where recall is greater than 0.5.

You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI Training, and you want to improve the model’s training time. What should you try out first?. A. Train your model in a distributed mode using multiple Compute Engine VMs. B. Train your model using Vertex AI Training with CPUs. C. Migrate your model to TensorFlow, and train it using Vertex AI Training. D. Train your model using Vertex AI Training with GPUs.

You are an ML engineer at a retail company. You have built a model that predicts a coupon to offer an ecommerce customer at checkout based on the items in their cart. When a customer goes to checkout, your serving pipeline, which is hosted on Google Cloud, joins the customer's existing cart with a row in a BigQuery table that contains the customers' historic purchase behavior and uses that as the model's input. The web team is reporting that your model is returning predictions too slowly to load the coupon offer with the rest of the web page. How should you speed up your model's predictions?. A. Attach an NVIDIA P100 GPU to your deployed model’s instance. B. Use a low latency database for the customers’ historic purchase behavior. C. Deploy your model to more instances behind a load balancer to distribute traffic. D. Create a materialized view in BigQuery with the necessary data for predictions.

You work for a small company that has deployed an ML model with autoscaling on Vertex AI to serve online predictions in a production environment. The current model receives about 20 prediction requests per hour with an average response time of one second. You have retrained the same model on a new batch of data, and now you are canary testing it, sending ~10% of production traffic to the new model. During this canary test, you notice that prediction requests for your new model are taking between 30 and 180 seconds to complete. What should you do?. A. Submit a request to raise your project quota to ensure that multiple prediction services can run concurrently. B. Turn off auto-scaling for the online prediction service of your new model. Use manual scaling with one node always available. C. Remove your new model from the production environment. Compare the new model and existing model codes to identify the cause of the performance bottleneck. D. Remove your new model from the production environment. For a short trial period, send all incoming prediction requests to BigQuery. Request batch predictions from your new model, and then use the Data Labeling Service to validate your model’s performance before promoting it to production.

You want to train an AutoML model to predict house prices by using a small public dataset stored in BigQuery. You need to prepare the data and want to use the simplest, most efficient approach. What should you do?. A. Write a query that preprocesses the data by using BigQuery and creates a new table. Create a Vertex AI managed dataset with the new table as the data source. B. Use Dataflow to preprocess the data. Write the output in TFRecord format to a Cloud Storage bucket. C. Write a query that preprocesses the data by using BigQuery. Export the query results as CSV files, and use those files to create a Vertex AI managed dataset. D. Use a Vertex AI Workbench notebook instance to preprocess the data by using the pandas library. Export the data as CSV files, and use those files to create a Vertex AI managed dataset.

You developed a Vertex AI ML pipeline that consists of preprocessing and training steps and each set of steps runs on a separate custom Docker image. Your organization uses GitHub and GitHub Actions as CI/CD to run unit and integration tests. You need to automate the model retraining workflow so that it can be initiated both manually and when a new version of the code is merged in the main branch. You want to minimize the steps required to build the workflow while also allowing for maximum flexibility. How should you configure the CI/CD workflow?. A. Trigger a Cloud Build workflow to run tests, build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines. B. Trigger GitHub Actions to run the tests, launch a job on Cloud Run to build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines. C. Trigger GitHub Actions to run the tests, build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines. D. Trigger GitHub Actions to run the tests, launch a Cloud Build workflow to build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines.

You are working with a dataset that contains customer transactions. You need to build an ML model to predict customer purchase behavior. You plan to develop the model in BigQuery ML, and export it to Cloud Storage for online prediction. You notice that the input data contains a few categorical features, including product category and payment method. You want to deploy the model as quickly as possible. What should you do?. A. Use the TRANSFORM clause with the ML.ONE_HOT_ENCODER function on the categorical features at model creation and select the categorical and non-categorical features. B. Use the ML.ONE_HOT_ENCODER function on the categorical features and select the encoded categorical features and non-categorical features as inputs to create your model. C. Use the CREATE MODEL statement and select the categorical and non-categorical features. D. Use the ML.MULTI_HOT_ENCODER function on the categorical features, and select the encoded categorical features and non-categorical features as inputs to create your model.

You need to develop an image classification model by using a large dataset that contains labeled images in a Cloud Storage bucket. What should you do?. A. Use Vertex AI Pipelines with the Kubeflow Pipelines SDK to create a pipeline that reads the images from Cloud Storage and trains the model. B. Use Vertex AI Pipelines with TensorFlow Extended (TFX) to create a pipeline that reads the images from Cloud Storage and trains the model. C. Import the labeled images as a managed dataset in Vertex AI and use AutoML to train the model. D. Convert the image dataset to a tabular format using Dataflow Load the data into BigQuery and use BigQuery ML to train the model.

You need to deploy a scikit-leam classification model to production. The model must be able to serve requests 24/7, and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment. What should you do?. A. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 1. B. Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100. C. Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 1. D. Deploy an online Vertex AI prediction endpoint with one GPU per replica. Set the max replica count to 100.

You work with a team of researchers to develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?. A. Configure a v3-8 TPU VM. SSH into the VM to train and debug the model. B. Configure a v3-8 TPU node. Use Cloud Shell to SSH into the Host VM to train and debug the model. C. Configure a n1 -standard-4 VM with 4 NVIDIA P100 GPUs. SSH into the VM and use ParameterServerStraregv to train the model. D. Configure a n1-standard-4 VM with 4 NVIDIA P100 GPUs. SSH into the VM and use MultiWorkerMirroredStrategy to train the model.

You created an ML pipeline with multiple input parameters. You want to investigate the tradeoffs between different parameter combinations. The parameter options are • Input dataset • Max tree depth of the boosted tree regressor • Optimizer learning rate You need to compare the pipeline performance of the different parameter combinations measured in F1 score, time to train, and model complexity. You want your approach to be reproducible, and track all pipeline runs on the same platform. What should you do?. A. 1. Use BigQueryML to create a boosted tree regressor, and use the hyperparameter tuning capability. 2. Configure the hyperparameter syntax to select different input datasets: max tree depths, and optimizer learning rates. Choose the grid search option. B. 1. Create a Vertex AI pipeline with a custom model training job as part of the pipeline. Configure the pipeline’s parameters to include those you are investigating. 2. In the custom training step, use the Bayesian optimization method with F1 score as the target to maximize. C. 1. Create a Vertex AI Workbench notebook for each of the different input datasets. 2. In each notebook, run different local training jobs with different combinations of the max tree depth and optimizer learning rate parameters. 3. After each notebook finishes, append the results to a BigQuery table. D. 1. Create an experiment in Vertex AI Experiments. 2. Create a Vertex AI pipeline with a custom model training job as part of the pipeline. Configure the pipeline’s parameters to include those you are investigating. 3. Submit multiple runs to the same experiment, using different values for the parameters.

You received a training-serving skew alert from a Vertex AI Model Monitoring job running in production. You retrained the model with more recent training data, and deployed it back to the Vertex AI endpoint, but you are still receiving the same alert. What should you do?. A. Update the model monitoring job to use a lower sampling rate. B. Update the model monitoring job to use the more recent training data that was used to retrain the model. C. Temporarily disable the alert. Enable the alert again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint. D. Temporarily disable the alert until the model can be retrained again on newer training data. Retrain the model again after a sufficient amount of new production traffic has passed through the Vertex AI endpoint.

You developed a custom model by using Vertex AI to forecast the sales of your company’s products based on historical transactional data. You anticipate changes in the feature distributions and the correlations between the features in the near future. You also expect to receive a large volume of prediction requests. You plan to use Vertex AI Model Monitoring for drift detection and you want to minimize the cost. What should you do?. A. Use the features for monitoring. Set a monitoring-frequency value that is higher than the default. B. Use the features for monitoring. Set a prediction-sampling-rate value that is closer to 1 than 0. C. Use the features and the feature attributions for monitoring. Set a monitoring-frequency value that is lower than the default. D. Use the features and the feature attributions for monitoring. Set a prediction-sampling-rate value that is closer to 0 than 1.

You have recently trained a scikit-learn model that you plan to deploy on Vertex AI. This model will support both online and batch prediction. You need to preprocess input data for model inference. You want to package the model for deployment while minimizing additional code. What should you do?. A. 1. Upload your model to the Vertex AI Model Registry by using a prebuilt scikit-ieam prediction container. 2. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job that uses the instanceConfig.instanceType setting to transform your input data. B. 1. Wrap your model in a custom prediction routine (CPR). and build a container image from the CPR local model. 2. Upload your scikit learn model container to Vertex AI Model Registry. 3. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job. C. 1. Create a custom container for your scikit learn model. 2. Define a custom serving function for your model. 3. Upload your model and custom container to Vertex AI Model Registry. 4. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job. D. 1. Create a custom container for your scikit learn model. 2. Upload your model and custom container to Vertex AI Model Registry. 3. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job that uses the instanceConfig.instanceType setting to transform your input data.

You work for a food product company. Your company’s historical sales data is stored in BigQuery.You need to use Vertex AI’s custom training service to train multiple TensorFlow models that read the data from BigQuery and predict future sales. You plan to implement a data preprocessing algorithm that performs mm-max scaling and bucketing on a large number of features before you start experimenting with the models. You want to minimize preprocessing time, cost, and development effort. How should you configure this workflow?. A. Write the transformations into Spark that uses the spark-bigquery-connector, and use Dataproc to preprocess the data. B. Write SQL queries to transform the data in-place in BigQuery. C. Add the transformations as a preprocessing layer in the TensorFlow models. D. Create a Dataflow pipeline that uses the BigQuerylO connector to ingest the data, process it, and write it back to BigQuery.

Denunciar Test

▲