Nhinnaest
![]() |
![]() |
![]() |
Título del Test:![]() Nhinnaest Descripción: datos para estudiar Nhinna |




Comentarios |
---|
NO HAY REGISTROS |
When analyzing a three-way table withn variacles A, B and C using a log-linear model, what does the significant of the three-factor interaction term (ABC) imply?. Variables A and B are conditionally independent given C. Each two-way interaction is adequatel describing the association between variables. The three variables have a joint effect that is not explained by the two-way interactions. The marginal distributions of A, B, and C are uniform. When identifying a seasonal ARIMA model for monthly sales data that shows a clear seasonal pattern withn a peak every December, which component is essential to include in the model?. A seasonal differencing term at lag twelve to remove the annual seasonal effect. A high-order autoregressive tem to capture the peak sales in December. A moving average term to smooth out the fluctuations within each year. A non-seasonal differencing term to model the peak sales in December. Using the small dataset below, calculate the coefficient of determinatio (R²) for the linear regression model: x: [1, 2, 3] y: [1, 3, 5]. 1. 0.5. 2. -1. Using the following data, what is the y-intercep of the regression line: x: [2, 4, 6] y: [3, 7, 11]. 0. 1. -1. 2. Which of the following is not an assumption of the F-test?. The population from which the samples are drown are normally distributed. The samples are independent of each other. The data is at least ordinal level. The data is ratio or interval level. What is the main difference between a parametric and a non-parametric test?. Parametric tests assume normality; non-parametric tests do not. Parametric tests require ordinal data; non-parametric tests require interval data. Parametric tests use ranks; non-parametric tests use actual values. Parametric tests assume non-normality; non-parametric tests assume normality. When is it preferable to use a non-parametric test over a parametric test?. When data is normally distributed. When dealing with interval data. When data does not meet normality assumptions. When comparing three or more groups. Define and give an example of ordinal data. Weight in kilograms. Customer satisfaction levels: Very dissatisfied, Dissatisfied, Neutral, Satisfied, Very satisfied. Temperature in Celsius. Income in dollars. Define and give an example of interval data. Rank in a race. Temperature in Celsius. Weight in pounds. Age in years. Define and give an example of ratio data. Temperature in Fahrenheit. Likert scale responses. Height in meters. Calendar years. Name three parametric tests. t-test, ANOVA, Regression;. t-test, Chi-square, Regression;. t-test, Chi-square, Regression; Kruskal-Wallis,. t-test, Ruskal-Wallis, Friedman. Name three non- parametric tests. Mann-Whitney U, Kruskal-Wallis, Prueba de Wilcoxon. t-test, Chi-square, Regression;. t-test, Regression; Kruskal-Wallis,. ANOVA, Ruskal-Wallis, Friedman. What is homogeneity of variances and why is it important in parametric tests?. It means variances are different; it's not important. It means variances are equal; it's crucial for valid results. It refers to data normality; important for all tests. It means means are equal; crucial for all tests. Explain how the Kruskal-Wallis test serves as a non-parametric alternative to one-way ANOVA. It compares means of three or more groups. It compares medians using ranks for three or more independent groups. It tests for normality in data. It compares variances of three or more groups. In analyzing monthly sales data with a clear seasonal pattern, what component is essential in an ARIMA model?. Seasonal differencing term at lag twelve. Autoregressive term at lag one. Moving average term at lag twelve. Non-seasonal differencing term at lag one. Describe a scenario where you would use the Mann-Whitney U test instead of a Student's t-test. Comparing two independent groups with normally distributed data. Comparing two related groups with non-normal data. Comparing two independent groups with ordinal data. Comparing three or more groups with normal data. What is the difference between interval and ratio scales?. Interval scales have a true zero; ratio scales do not. Ratio scales have a true zero; interval scales do not. Both have true zero; they are the same. Interval scales are for categorical data; ratio scales are not. Why is normality important in parametric tests?. It ensures data variability. It allows valid use of parametric tests. It confirms data independence. It reduces data errors. Which non-parametric test would you use as an alternative to the paired t-test (T student)?. Mann-Whitney U. Kruskal-Wallis. Wilcoxon signed-rank. Friedman. In a SARIMA model, what do the components (P, D, Q)[s] represent?. Non-seasonal AR, differencing, MA; seasonal period. Seasonal AR, differencing, MA; seasonal period. Seasonal AR, integration, MA; seasonal period. Non-seasonal AR, integration, MA; seasonal period. What is homoscedasticity and how can it be verified in regression analysis?. Equal error variance; correlation matrix. Equal mean values; t-test. Normal distribution; histogram. Constant error variance; visual inspection of residuals plot. When should you use the Friedman test. Comparing three or more independent groups. Comparing two independent groups with ordinal data. Comparing three or more related groups with ordinal data. Comparing two related groups with interval data. What does it mean for a test to have higher power and how does this affect the choice between parametric and non-parametric tests?. Higher power means lower sensitivity; prefer non-parametric tests. Higher power means higher sensitivity; prefer parametric tests if assumptions are met. Higher power means lower false positives; prefer non-parametric tests. Higher power means higher false positives; prefer parametric tests. In a studi examining the effctes of three different teaching methods on test scores, students were radomly assigned to one of the three methods. Usinf a one-way ANOVA, the researcher calculated an F-statistic of . What is the next step to determine if the teaching methods significantly affect test scores?. Compare the F-statistic to the sample means. Calulate the mean and standar deviation of each group. Compare the F-statistic to the sample means to the critical value from F-distribution. Plot the data to see the distribution. You have a dataset withn highly imbalanced classes. How would you handle this imbalance to improve the performance of a classification model?. Use the Synhetic Over-sample Technique (SMOTE) to generate sythetic samples for the minority class. Apply a random undersampling technique to reduce the majority class size. Use a weighted loss function in the classificatio model. Perform a cost-sensitive leraning approach. In a logistic regression model, if the interaction term between two binary predictors (X1, X2) is significant, what does this imply about the relarionshio between the predictors and the outcome?. The effect of one predictor on the outcome depends on the presence of the pther predictor. The predictors have additibe effects on the log-odds of the outcome. The dignificance of the interaction terms is indicative of multicollinarity. The interaction term´s coefficient can be interpreted as the odds ratio. How sould researches proceed with a two-sample t-test when the data are not normally distributed?. The T-test is robust to non-normality if both sample sizes are large. The T-test can only be used withn normally distributed data, regardless of sample size. A transformation of the data is required to achive normality. The central limit theorem is irrelevant in the context of the t-test. What is the clustering method wich takes care of variance in data?. Decision tree. Gaussian mixture model. K means. All. Which of the following statements is true about K-means and its handling of the shape of clusters. K-means assumes clusters can have any shape. K-means assumes clusters are spherical and of similar size. K-means can handle elliptical clusters with different sizes. K-means uses Gaussian distributions to model clusters. In a dataset with clusters of different shapes and sizes, which clustering method would be more suitable to capture the data structure?. K-means. Gaussian Mixture Models (GMM). Decision Trees. Principal Component Analysis (PCA). Why are decision trees not suitable for clustering tasks?. Because they cannot handle large volumes of data. Because they are designed for supervised classification and regression tasks. Because they require a target variable to cluster data. Because they cannot model variance within groups. Which of the following methods considers variance and covariance within groups when performing clustering?. K-means. Gaussian Mixture Models (GMM). Decision Trees. Linear Regression. What is the main difference between the group assignment methods of K-means and Gaussian Mixture Models (GMM)?. K-means uses soft assignment, while GMM uses hard assignment. K-means uses hard assignment, while GMM uses soft assignment. Both use hard assignment. Both use soft assignment. Individual respondents, including focus groups, and panels belong to: Primary data source. Secondary data source. Itemsied data source. Pointed data sourced. If a two-sample t-test results in a p-value of 0.04, what can be cocluded under a significance level of 0.05?. There is a 4% probability that the null hypothesis is true. The test has 96% power. The difference in means is practically significative. There is sufficient evidence to reject the null hypothesis at 5% significance level. In LASSO regression, what mechanismo is used for variable selection?. Minimizing the sum of the squared residuals. Maximizing the R-squared value if the model. Imposing a penalty equivalent to the abslue value of coeficients. Maximizing the sum of the squared residuals. When assesing the fit of a logistic regression model, which of the following is not a suitable method?. Hosmer-Lemeshow test. Area under the ROC curve (AUC). Adjusted R-squared. Likehood radio test. In a Chi-squared test for 3x3 contingency tablem what are the degrees of freedom for the test?. 4. 9. 6. 0. A factory produces three types of widgets A, B, C. The probability of a defeat in types A, B C are 0.02, 0.03 and 0.05: If one widget of each type is produced. What us the probablity that at least one widget is defective?. The probablity is the hightes individual defect probabiliy, which is 0.05. The probability is the sum of the defect probabilities for all widget types. The probability is one minus the probability that one of the widgets is defective,. The probability is the average of the defect probabilities for the theree widget types. What are the principles based on which inference engines work?. Backward chaining. Forward chaining. Both. None. Which of the following principles do inference engines operate on?. Knowledge Representation. Forward Chaining. Rule-based Systems. All. What is the primary difference between forward chaining and backward chaining in inference engines?. Forward chaining starts with known facts, while backward chaining starts with a goal. Forward chaining uses probabilistic reasoning, while backward chaining uses fuzzy logic. Forward chaining is goal-driven, while backward chaining is data-driven. Forward chaining handles uncertainty, while backward chaining handles incomplete information. How do inference engines handle uncertainty and incomplete information?. By using pattern matching. By using logical deduction. By using probabilistic reasoning and fuzzy logic. By using forward chaining. What is fuzzy logic primarily used for in inference engines?. To handle binary true/false decisions. To manage ambiguity and vagueness in data. To perform arithmetic calculations. To ensure data integrity. Which method typically shows greater robustness to noisy data?. Secuential methods like Gradient Boosting. Ensemble methods like Bagging. Secuential methods like random forest. Ensemble methods like adaboost. What is the main objective of ensemble methods?. To iteratively improve accuracy by correcting errors. To combine multiple models to improve accuracy and robustness. To reduce the size of the dataset. To increase the complexity of the model. Which ensemble method is used to combine multiple decision trees?. AdaBoost. Random Forest. Gradient Boosting. K-Means Clustering. What is a key characteristic of sequential methods?. Training multiple models in parallel. Iteratively correcting errors from previous models. Splitting the past dataset into disjoint subsets. Using only decision trees. Which of the following methods focuses on improving accuracy by correcting hard-to-predict errors?. AdaBoost. Random Forest. Bagging. Voting classifier. In what type of problems is Gradient Boosting typically used?. Clustering. Classification and regression. Dimensionality reduction. Market segmentation. Which of the following is a typical use of ensemble methods in medical diagnosis?. K-Means Clustering. Voting Classifier. Linear Regression. Principal Component Analysis. Which sequential method is used to improve fraud detection by adjusting simple models?. Random Forest. Gradient Boosting. AdaBoost. Linear Discriminant Analysis. How is robustness to noisy data improved in ensemble methods like Bagging?. By using a single training iteration. By averaging the predictions of multiple models. By reducing the number of variables in the model. By increasing the size of the dataset. Which ensemble method is used in applications like housing price prediction?. K-Means Clustering. Voting Classifier. Linear Regression. Gradient Boosting. A researcher wants to determine if there is a sigfnificant difference in the average time to complete a task using four ifferent softwares tools. The times (in minutes) are recorded. The one-way ANOVA test resulted in a F-statitisc of 4.00 and critical F-value of 4.07, at the 0.05 significance level. What should the researcher conclude?. Reject the null hypothesis and conclude the significant differences. Fail to reject the null hypothesis. Calculate the p-value from F statitic. Plt the data to visually compare the tools. For the following data set, calculate the regression equation y=mx+b X: [1,3,5] Y: [2,6,10]. y=x+1. y=x+1+b. y=x+2. y=2x. Given a high-dimensional dataset, how would you reduce its dimensionality while retaining as much informatio as psssible?. Applying principal component analysys (PCA) to reduce the dimensionality if the dataset. Applying principal component analysys (PCA) to increase the dimensionality if the dataset. Applying a K-means clustering to reduce the dimensionality if the dataset. Applying fuzzy to reduce the dimensionality if the dataset. In the context of variable selection, why is feature scaling particularly important in models like K-Nearest Neghbours (LNN) and SVM?. To prevent overlfitting durinf the model training process. To ensuere convergence during the training process. To allow the model to approproately weigh each feature´s importance. To enable the application of gradient descent algorithms. After finding a significant result in ANOVA, wich pot hoc test is not typically used to determine which groups from each other?. Turkey's HSD test. Dunett's test. Kruskal-Wallis test. Bonferroi correction. In a regression model examining the efct of education level on salary, withn high school, bachelor and master like three categories, how should dymmy variables be included?. Include two dummy variables, one for Bachelor and one for Masters, using High School as the reference category. Include dummy variables for all the three categories without a reference category. Include one dummy variables for higher education,combine both categories bachelor, and master. Include the three variables and remove the constant term from the model. A teacher is comparing the variability of test scores between two classes. Class X has test scre variance of 16, and class Y has a variance of 25. The teache then applies a curve by multiplying all scores by two. How does this curving affect the comparision of score varability between the two classes?. The relative variability between the two classes remains the same even after curving. Class X will has more variability than class Y after appliying the curve. Class Y will shows less variability than class X after curving. The curve will equalize the variance, making the comparable. What foes hybrid Bayesian Network consist of?. Discrete nodes. Continuous nodus. both. none. In the context of Markov Chains, what is the defining characteristic of a state being recurrent?. It has a non-zero probability of being visited infinitely often. It cannot be visited more than once. it is a guaranteed to be revisited. It is a transient state. You are analyzing a dataset that shows the annual revenue and marketing spend for a series of small bussiness. You notice one data point far removed from the others, where a business has an exceptionally hugh revenue but low marketing spend. How might thus outlier most tipically affect following aspects of your buvariate analysis?. The calculated Pearson correction coeficient betwen revenue and marketing spend. The slope of the least squares regression line predicting revenue from marketing spend changes from what it would be without this outlier. The p-value in testing the hypothesis that there is no correlation between the variables will change. The confidence intervals or the expected revenue given a certain marketing spend change. What type of node in a Bayesian network represents a continuous variable?. Discrete node. Stochastic node. Observable node. Continuous node. Which of the following is NOT a typical use of nodes in a Bayesian network?. Representing state variables in a control system. Modeling causal relationships between variables. Performing complex arithmetic operations. Facilitating inference and probabilistic reasoning. Which method can be useful to examine inflation rate anticipation, unemployment rate, anda capacity utilisation to produce products?. Econometric modeling. Time series analysis. Forecast. Data improving. What is the difference between the simulation of a geometrical distribution from a binomial distrobution?. It focuses on the number of trials needed to get the first success. It stimulates the number of successes in a fixed number of trials. The probability of success changes each trial. It requires the outcomes to be dependent on previous trials. What can be inferred if two variables have a covariance of zero?. The two variables are independent. There is no linear relatioship, but there could be another form of relationship. The two variables have no relationshi whatsover. the data point of the two variables are symmetrically distributed. Given a dataset of financial transactions, how would you detect outlier transactions?. Perform k-means. Use an anomaly detection algorithm such as Isolation Forest to identify iutliers in the dataset. Calculate the correction matrix. Use a linear regression. Th F-test is primarily used for which statistical purpose?. To test the independence of two categorical variables. To compare the means of three or more groups. To compare the variances of two normally distributed populations. To assess the goodness of fit of a model. A market researcher is comparing the proportion of customers who prefer product A and B, in two separated cities. In City X 200 surveys, 120 prefered A. In city Y, 300, prefered 195 A.To determinate the significant difference in proportions, a significance test is conducted. Assuming equal variances, what is the correct approach to find the p-value for this test?. Calculate the pooled proportion and use it to find the standard error, then calculate the test statistic and corresponding p-value. Calculate the separate proportions and directly compare them without standard error. Use the t-test for independent samples without pooling proportions. Calculate the mean preference score for each city and use ANOVA. In which of the following types of sampling the information is carried out under the opinion of an expert?. Snowball sampling. Multistage sampling. Systemic sampling. Judgment sampling. How can knoledge in artifical intelligence be represented as?. predict logic. Propositional logic. None. Both. What does the p-value indicate in hypothesis testing?. The probability of making a Type I mistake. The probability of making a Type II mistake. The level of significance in the statistical test. The probablity of rejecting a true null hypothesis. In which of the following example scenarious could a Monte Cralo simulation be most appropriately used?. To calculate the exact probability of obtaining exactly 10 heas in 20 coin flips. To estimate the probablity distribution of returns for a complex investment portfolio. To determine the precise mean of a normally distributed random variable. To simulate a deterministic process with no randomness involved. Which statment best distinguishes convergence in probability from convergence in distribution?. Convergence in probability applies to sequence of random variables, while convergence in distribution applies to a sequence of distributions. Convergence in probability is about approaching a specific value, whereas convergence in distribution relates to the shape of the distribution. Convergence in probability is about approaching to the shape of the distribution, whereas convergence in distribution relates to approaching a specific value. Convergence in probability applies to sequence of normal variables, while convergence in distribution applies to a sequence of distributions. How would you determine the optimal number of clusters in a dataset using the k-means clustering algorithm?. Apply a hierarchical clustering algorithm. Calculate the silhouette score for different numbers of clusters. Use the elbow method to plot the within-cluster sum of squares against the number of clusters. Use the gap statistic method to determine the optimal number of clusters. In study testing multiple hypotheses simultaneously, what must be adjunted to control the overall Type I error rate?. The significance level must be adjusted downward to account for the increased likehood of committig a Type error I across multiple tests. The power of the test must be increased to compensate for the multiple comparisions. The p value must be multiplied by the number ob hypotheses tested. The sample must be increased proportionally to the number of hypotheses tested. What is the name for the normal distribution?. Lagrangian. Cauchy. Laplacian. Gaussian. In OLS regressions, why might a researcher apply a logarithmic transformation to the independent variable X?. To linearize an exponential relationship between X and Y. To reduce the impact of outliner in the dependant variable Y. To convert a multiplicative relationship into an additive one. To increase the homoscedasticity of the residuals. You have data on the heights of plants from three different groups subjected to different fertilizers. To test if there are differences in the central tendencies of the groups, which nonparametric test would you use?. Kruskal-wallis. Mann-Whitney. Wilconxon. Friedman. Two players A and B, play a series of games. The probability that A wins any game is independent of the other games is 0.6. If they play four games, waht is the probability that A wins exactly three games?. The probability is 0.6 raised to the third power since A needs to win three games. The probability is 0.6 to the fourth power since A participates in all four games. The probability is 0.4, as an average. The probability is calculated based on a binomial distributin with four trials and a sucess probablity of 0.6. An assignment of random variable and their corresponding probabilities is called. Probability distribution. Fuzzy distribution. Logic distribution. Mass distribution. Which of the following describes a property of a time-homogeneuous Markov process?. The future state depends on all previous states. The transition probabilities are independent of time. The transition probabilities change with each time step. The process requires a constant time interval between observations. In birt-death process, if the birth rate is constant and the deatch rate is also constant, what is the stationary distribution of the number of customers in the system?. Poisson. Normal. Uniform. Binomial. Which of the following distributions is appropriate for modeling the number of successes in a series of independent Bernoulli trials?. Normal Distribution. Binomial Distribution. Exponential Distribution. Gamma Distribution. Which distribution is most appropriate for modeling the number of events occurring in a fixed interval of time, assuming these events occur at a constant rate and independently?. Normal Distribution. Binomial Distribution. Poisson Distribution. Gamma Distribution. Which parameters completely define a normal distribution?. Mean and Variance. Mean and Standard Deviation. Mean and Mode. Variance and Standard Deviation. Which of the following distributions is continuous and used to model the time between events in a Poisson process?. Binomial Distribution. Poisson Distribution. Exponential Distribution. Geometric Distribution. Which of the following is a primary application of the chi-square distribution?. Modeling time between events. Conducting goodness-of-fit tests. Modeling the number of successes in a series of trials. Modeling probabilities in Bayesian analysis. Which of the following distributions is used to model the proportion of successes in a bounded interval [0, 1]?. Normal Distribution. Beta Distribution. Poisson Distribution. Binomial Distribution. Which of the following is a characteristic of a uniform distribution?. It is skewed to the right. All outcomes are equally likely. It has a bell-shaped curve. It is skewed to the left. In the exponential distribution, what does the parameter 𝜆 represent?. The mean. The variance. The rate of occurrence. The standard deviation. What does the geometric distribution model?. The number of successes in a fixed number of trials. The number of trials until the first success. The time between events. The number of events in a fixed interval of time. The gamma distribution generalizes which other distribution by allowing for the modeling of the time until multiple events occur?. Poisson. Normal. Exponential. Binomial. Under what condition does the binomial distribution approximate the Poisson distribution?. When the number of trials is small and the probability of success is large. When the number of trials is large and the probability of success is small. When the number of trials is small and the probability of success is small. When the number of trials is large and the probability of success is large. Which of the following is true for a normal distribution?. It has a skewed shape. Its mean, median, and mode are all different. It is symmetric about the mean. It is used for discrete data. In a Poisson distribution, what is the relationship between the mean λ and the variance?. Mean is greater than variance. Variance is greater than mean. Mean and variance are equal. No specific relationship. Which scenario is appropriate for applying the chi-square test of independence?. Testing if a sample mean is significantly different from a known population mean. Testing if two categorical variables are independent. Testing if the variances of two populations are equal. Testing if a sample proportion is significantly different from a known population proportion. When is the t-distribution used instead of the normal distribution?. When the sample size is large. When the sample size is small and the population variance is known. When the sample size is small and the population variance is unknown. When the data is categorical. A box containes three types of balls: 10 red, 15 white and 25 blue. if a ball is randomly selected, what is the probablity that is not blue, using the Law of Total Probability?. The probability is the sum of probabilities of selecting a red ball and slecting a white ball. The probability is the same as the probability of sleecting a blue ball since there are only 2 outcomes. The probability is calculated by adding the probability of sleecting a red ball to the probability of selecting a white ball, then dividing the total number of balls. The probability is the ratio of the number of blue balss to the total number of balls. A researcher wants to know if a new teaching method leads to improve test scores. The null hypotehesis states there is no chance in scores. After applying the new method, the researcher conducts a significance test on the sample data. Which type of test shoud the researcher use to determine if there improvement?. A two tailed test because the researcher is looking for any change in scores. A one tailed test because the researcher is looking only for the improvements, not decline. A one tailed test to the left because the null hypothesis suggest no change. The choice of one or two tailed test dependes on the variance of the test scores. In the context of convergence in probabilities and distributions, which of th following statements is true?. If a sequence of random variables converges in distribution to a constant, it also converges in probability to that constant. Convergence in distribution implies convergence in probability under all circunstances. A sequence that converges in probability to a constant will also converge in distribution to a variable that constant´s distribution. Convergence in probability to a constant will also convergence in distribution and requires a more stringent proof. After performing a simple linear regression analysis, a data analysr finds a statistically significant positive slope in the regression equation relating X (dayli hours of sunlight) to Y (sales of sunscreen). What can be from this analysis?. More sunlight causes an increase in sunscreen sales. There us a direct proportional relarionship between hours of sunlight and sunscreen sales. There is evidence of an association between hours of sunlight and sunscreen sales. Sunscreen sales could be predicted for the hours of sunlight. Which type of ANOVA would be most appropriate for a study with two independent variables and one dependent variable?. One way ANOVA. Two ways ANOVA. Repeated measures of ANOVA. MANOVA. Which type of ANOVA is most appropriate for a study with two independent variables and one dependent variable?. One way ANOVA. Two ways ANOVA. Repeated measures of ANOVA. MANOVA. When is it most appropriate to use MANOVA instead of ANOVA?. When there is only one independent variable. When there is only one dependent variable. When there are multiple dependent variables. When the independent variables are continuous. In a Two-Way ANOVA, what does the interaction effect between the two independent variables indicate?. The combined impact of both independent variables on the dependent variable. The main effect of one independent variable on the dependent variable. The effect of time on the dependent variable. The effect of each independent variable separately on the dependent variable. A statistian is tasked with selecting a sample of households to study consumer spending habits. They need a method that is both efficient and ensures a representative sample of the entire population. Which sampling method is most appropriate for this study?. Simple random sampling from a list of all hosueholds in the population. Stratified sampling based on income brackets with proportional allocation. Judgmental Sampling. Convenience Sampling:. What factor most significantly influences the accuracy of a Monte Carlo simulation´s results?. The use of a random number generator with a very long period. The complexity of a mathematical model used in the simulation. The number of simulations or iterations run. The precision of the initial conditions of the model. Given a high dimensional dataset, how would you reduce its dimensionality, while retainings as much information as possible?. Use a j means clustering. Apply principal component analysis PCA, to reduce the dimensionality of the dataset. Perform a linear regression. Apply principal component analysis PCA, to compare the dimensionality of the dataset. Which scale in statistics is used to differentiate between magnitudes and proportions?. Exponential scale. Goodness scale. Ratio scale. Satisfactory scale. A data scientist has a time series dataset of daily temperatures spanning several years. She suspects non-stationary due to changing mean but is unsure if it is due to a trend or seasonality. Which test is most appropriate to determine if the time series is stationary?. Perform an augmented Dickey-Fuller test on dataset. Conduct Mann-Kendall tren test. Apply a seasonal descomposition to examine the tren components. Use a Run test to check for randomness in data. What is another name for the Bernoulli trials?. Two way experiment. Three way experiment. Dichotomous experiment. Nucleo experiment. Given a dataset with categorical features, how would you prepare these features for a machine learning model?. Use a label encoding to convert categorical features into numerical values. Perform a principal component analysis PCA on categorical features. Apply one-hot encoding to transform categorical featres into numerical features. Apply K means clustering algorithm to categorize the features. What is the key consideration when conducting a two-sample t-test with unequal sample sizes?. The test should not be used if the sample sizes are different. The test results are always unreliable. Its important to check for equality of variances, as its impact the test's robustness. The larger sample size must be at least twice the size of the smaller one. What is the key characteristic of a binomial distribution that should be incorporated in its simulation?. The number of trials is variable and depends on the outcome of each trial. Each trial has more than two possible outcomes. The probability of sucess remains constant across trials. The outcomes of the trials are dependent on each other. Which of the following sampling methods ensures a representative sample of the entire population?. Convenience Sampling. Stratified Random Sampling. Judgmental Sampling. Snowball Sampling. Which test is most appropriate to determine if a time series is non-stationary due to a trend or seasonality?. Jarque-Bera Test. Kolmogorov-Smirnov Test. Levene's Test. Augmented Dickey-Fuller Test. What is the key characteristic that should be incorporated in the simulation of a binomial distribution?. The number of trials and the probability of success. The mean and variance. The sample size and the confidence interval. The expected value and the standard deviation. Which method is most appropriate for transforming non categorical features into numerical features for a machine learning model?. Label Encoding. One-Hot Encoding. Scaling. Binning. What is the key consideration when conducting a two-sample t-test with unequal sample sizes?. Checking for normality of the data. Checking for homogeneity of means. Checking for equality of variances. Checking for independence of observations. Which type of ANOVA is most appropriate for a study with one categorical independent variable and one continuous dependent variable?. Two-Way ANOVA. One-Way ANOVA. Repeated Measures ANOVA. MANOVA. What is the primary purpose of the Kolmogorov-Smirnov test?. To test for non-stationarity in time series. To compare a sample distribution with a reference probability distribution. To test for equal variances across groups. To determine the normality of a distribution. What does stationarity in a time series mean?. The series has a constant mean and variance over time. The series is increasing over time. The series has a changing variance over time. The series shows seasonal patterns. What does the Jarque-Bera test check for in a dataset?. The presence of a unit root. Homogeneity of variances. Normality of the data. Autocorrelation in time series. For what purpose is Levene's test used?. To determine if a series is stationary. To test for equality of variances across groups. To compare mean values of two samples. To assess the presence of a trend in a time series. What does the presence of a unit root in a time series indicate?. The series is stationary. The series has a constant variance. The series is non-stationary. The series has no trend. How can a time series with trend and seasonality be transformed to achieve stationarity?. By applying logarithmic transformation. By differencing the series. By using Levene's test. By comparing means. What does the Augmented Dickey-Fuller test specifically test for in a time series?. Equality of variances. Normality of data. Presence of a unit root. Seasonal patterns. How would you evaluate the performance of a multi-class classification model?. Calculate the F1 score for each class and the macro-averaged F1s. Use the mean squared error (MSE) to avluate the model. Calculate the R squared metric for the model. Perform a principal component analysis PCA on the model predictions. When conducting an ANOVA test, if the p-value is less than the chosen level of significance, it indicates that: At least one group mean is different from the others. All groups means are different from each other. The group variaces are equal. The sample is insufficient for the test. Which of the following statements is correct?. Some cumulative distribution fuction F is non-decreasing and right continouos. Every cumulative distribution function F is decreasing and rights continouos. Every cumulative distribution function F is increasing and left continouos. Any of them. The scatter in a series of values around the average is called. Central tendency. Dispersion. Skewness. Symmetry. Which of the following statements best describes positive skewness in a dataset?. The left tail of the distribution is longer or fatter than the right tail. The right tail of the distribution is longer or fatter than the left tail. The distribution is symmetrical around the mean. The mean, median, and mode are all equal. Which of the following characteristics indicates that a distribution is symmetrical?. The mean is greater than the median. The distribution has a long tail on one side. The mean, median, and mode are all located at the center of the distribution. The data values are evenly spread out, with no peak. Which of the following best describes the concept of skewness in a dataset?. Skewness measures the average value of a dataset. Skewness indicates the spread of values around the mean. Skewness describes the asymmetry in the distribution of values. Skewness determines the central tendency of a dataset. In applying the Central Limit Theorem to a sample mean, under what conditio might the theorem not hold true?. When the population from which the sample is drawn is normally distributed. When the population is drawn from a population with a highly skewed distributin and the sample size is small. When the sample mean is calculated for a sample size greater than 30. When the population standard deviation is unknown. A researcher is testing the effctiveness of a new drug. The null hypo is that the drug has no effect. She uses a significance level of 0.05 and obtains 0.03. Conclusion?. Reject the null hypo and conclude the drug has a significant effect. Reject the null hypo and conclude the drug has a weak effect. Acept the null hypo and conclude the drug has not an effect. The drug has weak effect. What does a correlation coefficient close to -1 indicate?. A strong negative linear relationship with high predictability. The variables move in the same direction, but the reationship is weak. Almost no linear relationship between the variables. The slope of the linear relationship between the variables is -1. Which factor does not influenc the power of an ANOVA test?. The level of significance. The size of the sample. The magnitude of the difference between groups means. The order in which data is collected. From a standar decj of 52 playing cards, what is the relative frequency of drawing either a heart or a queen?. 1/52. 16/52. 13/52. 17/52. When conducting a multivariate OLS regression, you notice that 2 predictors are highly correlated. What potential issue does this pose for your regression results?. It guanrantees that the coefficients of x1 and x2 will be statistically insignificant. It may inflate the standards errors of the coefficients, leading to less precise stimates. It violates the assumption of homoscedascity, rendering the model invalid. It will cause R-squared of the model to decrease significantly. An analyst is forecasting a time series that shows evidence of both tren and seasonality. They have decided to use an additive Exponential Smoothing model. Which model should they use?. Simple Exponential Smothing. Hots linear, Trend method. Holt-Winters Additive method. Holt-Winters multiplicative method. Which type of smoothing is most appropriate for a time series with both trend and seasonality?. Simple Exponential Smoothing. Holt’s Linear Trend Model. Holt-Winters Additive Model. Moving Average. In the Holt-Winters Additive Model, which parameter is used to smooth the trend component?. Alpha (α). Beta (β). Gamma (γ). Delta (δ). What is the main goal of applying smoothing techniques to a time series?. To increase the data points. To reduce noise and reveal underlying patterns. To introduce random variations. To create a seasonal component. Why would an analyst choose the Holt-Winters Additive Model over Holt’s Linear Trend Model?. To handle data with a strong trend but no seasonality. To address seasonality in addition to trend. To simplify the model by removing trend components. To focus solely on the level component. The median of a frequency distribution can be determined with the: histogram. frecuency curve. frecuency polygon. ogive. After plotting the cumulative frequency curve, how do you find the median?. By finding the highest point on the curve. By finding the intersection point of the cumulative frequency and class interval. By drawing a horizontal line from the median position to the ogive and then a vertical line down to the horizontal axis. By finding the lowest point on the curve. What is the first step in constructing a cumulative frequency curve (ogive)?. Calculating the mean of the data. Calculating the cumulative frequency for each class interval. Calculating the mode of the data. Determining the range of the data. Which of the following is the correct method to determine the median using a cumulative frequency curve (ogive)?. Plotting frequency against class intervals and finding the peak point. Plotting cumulative frequency against upper class boundaries and finding the midpoint. Plotting cumulative frequency against upper class boundaries and using the median position to find the intersection. Plotting cumulative frequency against midpoints of class intervals and finding the mode. What does the cumulative frequency curve (ogive) represent?. The frequency distribution of the data. The total number of observations in each class interval. The running total of frequencies up to the upper boundary of each class interval. The average frequency of the data. In G/G/1 queue, what do the letters G/G/1 represent?. Gaussian arrivals, Gaussian service times and one server. Gaussian service times, Gaussian arrivals and one server. General arrivals, general service times and one server. Gaussian arrivals, Geneneral service times and one server. In a G/G/1 queue, what does the first "G" represent?. The Gaussian distribution of service times. The Gaussian distribution of inter-arrival times. The number of servers in the system. The average arrival rate. In a G/G/1 queue, what does the second "G" represent?. The Gaussian distribution of service times. The Gaussian distribution of inter-arrival times. The number of servers in the system. The average arrival rate. In the context of a G/G/1 queue, what does the "1" denote?. The Gaussian distribution of service times. The Gaussian distribution of inter-arrival times. The number of servers in the system. The average number of customers. Which of the following queue types has a single server with exponential inter-arrival times and Gaussian service times?. G/G/1. M/M/1. M/G/1. G/M/1. A research divided subjects into two groups and then selected members from each group for the sample. What sampling method was applied?. Cluster. Stratified. Random. Systematic. A researcher selects a sample of students by choosing every 10th student from an ordered list of the entire student population. What sampling method is this?. Simple Random Sampling. Systematic Sampling. Stratified Sampling. Cluster Sampling. A researcher wants to study the dietary habits of high school students in a city. She randomly selects 5 high schools out of 20 and includes all students from those 5 schools in her sample. What sampling method is this?. Simple Random Sampling. Systematic Sampling. Stratified Sampling. Cluster Sampling. A researcher assigns a number to each student in a school and uses a random number generator to select a sample of 50 students. What sampling method is this?. Simple Random Sampling. Systematic Sampling. Stratified Sampling. Cluster Sampling. What is the main advantage of stratified sampling?. It is quick and easy to implement. It ensures representation of all subgroups in the population. It requires less prior knowledge of the population. It avoids the introduction of bias completely. In cluster sampling, what is the primary characteristic of the clusters?. They are homogeneous within themselves and heterogeneous between each other. They are homogeneous between each other and heterogeneous within themselves. They are selected based on the researcher's convenience. They contain the same number of members. Which of the following is NOT a characteristic of convenience sampling?. It is quick and inexpensive. It can introduce significant bias. It ensures every member of the population has an equal chance of being selected. It is easy to implement. Which sampling method involves selecting members based on referrals from initial subjects?. Simple Random Sampling. Systematic Sampling. Quota Sampling. Snowball Sampling. In a company, employees are divided into departments. A researcher wants to ensure that employees from each department are included in the sample. Which sampling method should be used?. Cluster Sampling. Stratified Sampling. Convenience Sampling. Systematic Sampling. Which of the following describes a potential disadvantage of cluster sampling?. It is more efficient when the population is geographically dispersed. It increases the precision of the sample estimates. It may increase the variance if clusters are heterogeneous. It ensures all subgroups are represented. Which sampling method is best suited for a study where the population is difficult to reach?. Simple Random Sampling. Systematic Sampling. Judgment Sampling. Snowball Sampling. What is the key difference between stratified sampling and cluster sampling?. Stratified sampling divides the population into homogeneous groups, while cluster sampling divides it into heterogeneous groups. Stratified sampling selects groups, while cluster sampling selects individuals. Stratified sampling is non-probabilistic, while cluster sampling is probabilistic. Stratified sampling ensures representation of subgroups, while cluster sampling does not. Given a dataset with time series data, how would you ensure the model accounts for time dependency?. Perform a cimple linear regression on the time series data. Apply a k-means clustering algorithm to the time series data. Use a Long short Term Memory (LSTM). Use a principal component analysis PCA. Which model is specifically designed to handle seasonality in time series data?. AR(p). MA(q). ARIMA(p, d, q). SARIMA(p, d, q)(P, D, Q)[s]. What is the primary purpose of using differencing in an ARIMA model?. To incorporate seasonality. To remove autocorrelation. To make the time series stationary. To smooth the data. Which model is most suitable for capturing long-term dependencies in sequence data using neural networks?. ARIMA. MA. LSTM. SES. In the context of time series forecasting, what does the Kalman Filter primarily deal with?. Seasonal adjustment. State space modeling. Exponential smoothing. Moving averages. What is the main advantage of using Holt-Winters seasonal model?. It accounts for trend only. It accounts for level, trend, and seasonality. It only smooths the data. It uses moving average. How would you detect and handle multicollinearity in a multiple regression model?. Calculate the Variance Inflation factor VIP for each factor predictable. Perform a principal component analysis PCA to reduce multicollinearity. Use a correlation matrix to identify collinear variables. Apply a ridge regression model to mitigate multicollinearity. Which statistic is most commonly used to detect multicollinearity in a regression model?. p-value. F-statistic. Variance Inflation Factor (VIF). R-squared. What does a high Variance Inflation Factor (VIF) indicate?. Low correlation between predictors. Low correlation between predictors. Low standard errors. High p-values. Which method can be used to handle multicollinearity by transforming predictors into uncorrelated components?. Ridge Regression. Principal Component Analysis (PCA). Logistic Regression. Random Forest. What threshold VIF value typically indicates severe multicollinearity?. 10. 0.05. 2. -1. Which of the following is a direct approach to reduce multicollinearity?. Adding more predictor variables. Removing highly correlated predictors. Using a larger sample size. Increasing the number of observations. How would you evaluate the performance of a clustering algorithm?. Silhouette Score to measure the cohesion and separatio clusters. use the mean squared error MSE, to evaluate the clusters. Perform a principal componen anañisys PCA on the clustered data. Calculate the correlation matrix of the clustered variables. Which metric measures the average similarity ratio of each cluster with the one that is most similar to it?. Silhouette Score. Adjusted Rand Index. Davies-Bouldin Index. Inertia. Which internal evaluation metric ranges from -1 to 1 and measures how similar an object is to its own cluster compared to other clusters?. Normalized Mutual Information. Silhouette Score. Fowlkes-Mallows Index. Elbow Method. What is the purpose of using the Elbow Method in clustering?. To determine the consistency of clusters across different runs. To measure the amount of information shared between predicted clusters and true labels. To plot the inertia for different numbers of clusters and find the optimal number of clusters. To measure the geometric mean of the number of elements in the clusters. Which metric is suitable for evaluating clustering performance when true labels are available and considers all pairs of samples?. Silhouette Score. Adjusted Rand Index. Davies-Bouldin Index. Inertia. Which technique involves plotting silhouette scores for each sample to evaluate how well they fit within their cluster?. Elbow Method. Silhouette Analysis. Fowlkes-Mallows Index. Principal Component Analysis. When conducting a two sample t test, what happens if the assumption of equal variances is violated?. The test becomes more conservative, reducing Type I error. The test may still be valid if sample sizes are large. The test results become invalid, and a non-parametric test is required. The degrees of freedom used in the test need to be adjusted. Which alternative test should be used if the assumption of equal variances is violated in a two-sample t-test?. Paired t-test. Welch's t-test. Mann-Whitney U test. ANOVA. What does Welch's t-test adjust to account for unequal variances?. Mean. Sample size. Degrees of freedom. Standard deviation. If the assumption of equal variances is violated and you use Welch's t-test, what is the impact on the Type I error rate?. Increases the Type I error rate. Decreases the Type I error rate. Makes the Type I error rate unpredictable. No effect on the Type I error rate. When should you consider using a non-parametric test like the Mann-Whitney U test instead of a t-test?. When the data is normally distributed. When the data is not normally distributed and variances are unequa. When the sample size is large. When the sample sizes are equal. What does it mean for a test to be "more conservative"?. It has a higher probability of rejecting the null hypothesis. It has a lower probability of rejecting the null hypothesis. It ignores the assumptions of the test. It always produces the same results regardless of the data. What should be checked to decide whether to use a standard two-sample t-test or Welch's t-test?. Sample means. Sample medians. Equality of variances. Equality of sample sizes. Which test is specifically designed to handle unequal variances in a two-sample comparison?. Paired t-test. Wilcoxon signed-rank test. Welch's t-test. Kruskal-Wallis test. In the context of a two-sample t-test, what is a Type I error?. Rejecting the null hypothesis when it is true. Failing to reject the null hypothesis when it is false. Rejecting the alternative hypothesis when it is true. Failing to reject the alternative hypothesis when it is false. Why is Welch's t-test more appropriate than a standard t-test when variances are unequal?. It assumes equal means. It ignores the differences in variances. It adjusts the degrees of freedom to account for variance inequality. It increases the sample size. In the context of proportions, when a sample proportion is ised to estimate a population proportion, what is the shape of the sampling distribution of the sample proportion?. The sample distribution of the sample proportion will be aproximately normally distributed if the sample size is sufficiently large. The sample distribution of the sample proportion is always skewed to the right, regardless of the sample size. The sample distribution of the sample proportion is uniform across all possible sample proportions. The sample distribution of the sample proportion will always match the distribution of the population proportion. In the context of proportions, when a sample proportion is used to estimate a population proportion, what is the shape of the sampling distribution of the sample proportion?. Binomial distribution. Uniform distribution. Normal distribution. Exponential distribution. What condition must be met for the sampling distribution of the sample proportion to be approximately normal?. The sample size must be greater than 30. Both 𝑛𝑝 and and 𝑛(1−𝑝) must be greater than 5. The population size must be infinite. The population proportion must be 0.5. If the sample size increases, what happens to the standard error of the sample proportion?. It increases. It decreases. It remains the same. It becomes zero. When using a Vector Error Correction Model (VECM) in the analysis of multiple time series, what is the primary feature that distinguishes it from a standard vector autoregression (VAR) model?. VECM incorporates both short term dynamics and long term equilibrium relationship between cointegrated series. VECM can be used even when series are not cointegrated, unlike VAR models. VECM is a simplified version of the VAR model that requires fewer parameters. VECM focuses only on the stationary differences of the series, ignoring any long term equilibrium. Which condition must be met for a VECM to be an appropriate model for multiple time series?. The time series must be stationary. The time series must be non-stationary and not cointegrated. The time series must be non-stationary but cointegrated. The time series must have a deterministic trend. What does the error correction term in a VECM represent?. The long-term trend of the time series. The short-term fluctuations of the time series. The deviation from the long-term equilibrium relationship. The difference between observed and predicted values. In a VECM, what does the matrix Π capture?. The short-term dynamics of the variables. The long-term equilibrium relationships among the variables. The residuals of the model. The seasonal effects in the data. How does a standard VAR model treat non-stationary time series?. It includes an error correction term. It assumes all time series are stationary or have been differenced enough to be stationary. It adjusts for cointegration. It uses a separate model for each time series. A medical reseracher conducts a hypothesis test to determine if a new drug reduces systolic blood pressure than existing standard medication. If the tru difference in mean reductions between the new drug and the standard medication is larger than the hypothesized difference, how does this affect the power of the test?. The power of the test decreases as the true difference increases. The power of the test remains unchanged regardless of the true difference. The power of the test increases as the true difference increases. The power of the test is maximum when the tru difference is equal to the hypo. Which of the following is true for a continuous time Markov chain (CTMC) in the context of ots generator matrox (Q)?. The off diagonal elements are non positive. The diagonal elements are non positive. The rows of the matrix sum to one. The generator matrix is alwaus symmetric. What does the diagonal element qii of the generator matrix represent in a CTMC?. The rate at which the process enters state i. The negative of the total rate at which the process leaves state i. The total rate at which the process leaves state i. The rate at which the process transitions between any two states. In a CTMC, what does an off-diagonal element qii represent?. The probability of transitioning from state i to state j. The expected time spent in state i. The rate at which transitions occur from state i to state j. he rate at which the process returns to state i. Which of the following properties must a generator matrix Q of a CTMC satisfy?. All elements must be positive. The sum of each column must be zero. The sum of each row must be zero. The matrix must be invertible. How is feature selection commonly conducted in deep neural networks?. By analyzing the weights in the first hidden layer. Through dropout regularizatio technique. Employing autoenconders for dimensionality reduction. Using backpropagation to adjust feature importance. Which method involves training a model that reduces the dimensionality of input features while maintaining important information?. Employing autoencoders. Using backpropagation. Through dropout regularization. By analyzing the output layer. What is the primary purpose of using dropout in neural networks?. To select important features. To prevent overfitting. To increase the learning rate. To analyze feature importance. Which of the following techniques is not typically used for direct feature selection in deep neural networks?. Employing autoencoders. Using backpropagation. Through dropout regularization. By analyzing the weights in the first hidden layer. What is a fundamental difference between ensemble and sequential learning methods in model training?. Ensemble methods train multiple methods in parallerl, while sequential methods train models one after anotger. Ensemble methods use a single model, while sequential methods combine multiple models. Ensemble methods require more computational power for training than sequential methods. Sequential methods focus on reducing bias, while ensemble focus on reducing variance. Which of the following is an example of an ensemble learning method?. AdaBoost. Gradient Boosting. Random Forest. Support Vector Machine. In the context of boosting, what is the primary objective of each subsequent model in the sequence?. To train independently of previous models. To correct the errors made by the previous models. To reduce the model complexity. To increase the training speed. Which technique is commonly used in ensemble methods to combine the predictions of multiple models?. Backpropagation. Gradient Descent. Bagging. Cross-Validation. What is a key advantage of using ensemble learning methods?. They require less computational power than individual models. They can combine the strengths of multiple models to improve performance. They are simpler to interpret than single models. They always guarantee a perfect model. How does the identification of cointegration among variables improve forecasting models in time series analysis?. It allows for the use of simpler univariate models instead of complex multivariate models. It improves the accuracy of long-term forecast by ensuring the forecasted values adhere to the established equilibrium relationship. It ensures that forecasts will be perfect with no error. It allows for the complete elimination of model uncertainy in the presence of structural breaks. Which of the following models is commonly used to incorporate cointegration in time series forecasting?. ARIMA. VAR. VECM. Smoothing. What is the primary benefit of including an error correction mechanism in a forecasting model?. It simplifies the model. It adjusts short-term forecasts based on deviations from the long-term equilibrium. It captures long-term trends only. It ignores long-term relationships. Why is it important to identify cointegration when dealing with non-stationary time series?. To ensure the time series are independent. To validate the long-term relationship among the variables. To transform the series into stationary data. To simplify the analysis process. A researcher is using a Chi-squared tets for goodness of a fit to determine if a die is fair. The die is rolled 60 times, with the following observed frequencies: 10 times for each side. What is the Chi-squared statistic?. 5. 10. 0. 60. Select the order of sampling from best to worst: Simple random, stratified, convenience. Convenience, simple random, stratified. Stratified, simple random, convenience. Simple random, convenience, stratified. Select the order of sampling methods from best to worst: Simple Random Sampling, Stratified Sampling, Systematic Sampling, Cluster Sampling, Convenience Sampling. Simple Random Sampling, Stratified Sampling, Systematic Sampling, Cluster Sampling, Convenience Sampling. Systematic Sampling, Simple Random Sampling, Stratified Sampling, Cluster Sampling, Convenience Sampling. Stratified Sampling, Systematic Sampling, Cluster Sampling, Simple Random Sampling, Convenience Sampling. Which sampling method divides the population into strata and takes random samples from each stratum?. Simple Random Sampling. Stratified Sampling. Systematic Sampling. Cluster Sampling. In which sampling method are samples chosen based on ease of access and availability?. Simple Random Sampling. Stratified Sampling. Systematic Sampling. Convenience Sampling. Which sampling method involves selecting every nth member from a randomly ordered list?. Simple Random Sampling. Stratified Sampling. Systematic Sampling. Cluster Sampling. You are analyzing a dataset comparing the exam scores of student from two different teaching methods. To determine if there is a significant difference in the median scores between the two groups, which nonparametric test should you use?. Wilcoxon. Kruskal-wallis. Chi-square. Mann-Whitney. What is the primary advantage of using the Mann-Whitney U test over the t-test for independent samples?. It requires larger sample sizes. It does not assume normal distribution of the data. It can only be used for categorical data. It compares mean scores rather than median scores. If you have paired samples and want to compare their medians, which nonparametric test should you use?. Wilcoxon signed-rank test. Mann-Whitney U test. Kruskal-Wallis test. Chi-squared test. Which nonparametric test is suitable for comparing the medians of more than two independent groups?. Wilcoxon signed-rank test. Mann-Whitney U test. Kruskal-Wallis test. Chi-squared test. In which scenario is a two-sample t test appropriate to compare means?. When the sample are dependent and come from populations withn unequal variances. When the samples are independent and come from normally distributed popularions with equal variances. When one sample size is much larger than the other. When comparing more than two independent samples. How would you optimaze the hyerparameters of a mcahine learning model?. Perform a grid search with cross-validation to evaluate different combinations of hyperparameters. Use a simple random search to test hyperparameters. Calculate the mean squared error MSE, for a different hyperparameters. Apply a principal componen analysis PCA to selecthyperparameters. What is the primary goal of hyperparameter optimization in machine learning?. To reduce the number of features in the dataset. To find the best set of hyperparameters that maximize the model's performance. To improve the interpretability of the model. To ensure the data is normalized. Which optimization method uses probabilistic models to select the most promising hyperparameters?. Grid Search. Random Search. Bayesian Optimization. Gradient-based Optimization. What is a key advantage of Random Search over Grid Search?. It guarantees finding the optimal hyperparameters. It samples the hyperparameter space more efficiently, especially when the space is large. It always performs better in terms of model accuracy. It requires less computational power. How would you evaluate the performance of a regression model?. Calculate the R-squared (coefficient of determination) to measure the proportion of variance explained by the model. Use the mean absolute error (MAE) to evaluate the model. Perform a principal component analisys (PCA) on the residuals. Calculate the correlation coefficient between predicted and actual values. Which metric gives higher weight to larger errors in a regression model?Mean Absolute Error (MAE). Mean Squared Error (MSE). R-squared. Mean Absolute. Percentage Error (MAPE). Which metric adjusts the R-squared value based on the number of predictors in the model?. Mean Absolute Error (MAE). Mean Squared Error (MSE). Root Mean Squared Error (RMSE). Adjusted R-squared. For a certain continuous random variable, its probability density function f(x)= 6(x-1) for 0<x<1. What does the value of f(0.5) indicate?. The likehood of the random variable being exaclty 0.5. The cumulative probability up to 0.5. The probability that the random variable falls within a small interval around 0.5. The expected value of the random variable at 0.5. Which of the following measures is affected by outliers? i. The interquartile range ii. The range iii. The median. ii only. i and ii. ii and iii. i only. Which of the following measures is affected by outliers?. The interquartile range. The range. The median. Both the range and the median. Which measure is most appropriate to use when comparing the central tendency of two datasets that contain outliers?. Mean. Median. Range. Variance. Which measure of spread is robust to outliers?. Range. Variance. Interquartile Range (IQR). Standard Deviation. Why is the mean not considered a robust measure in the presence of outliers?. Because it does not consider all data points. Because it is the middle value of a dataset. Because it can be significantly influenced by extreme values. Because it only considers the first and third quartiles. What is the z-score for a value of 85 in a dataset with a mean of 70 and a standar deviation of 10?. 1.5. 1. 2. 1.75. In a Poisson process, which of the following statements is true about the inter-arrival times of events?. They follow a normal distribution. They are deterministic. They follow an exponential distribution. They are uniformly distributed. If the rate 𝜆 of a Poisson process is 5 events per hour, what is the expected inter-arrival time between events?. 0.2 hours. 1 hour. 5 hours. 0.5 hours. Which property of the exponential distribution is particularly important for the Poisson process?. It has a finite mean. It is always symmetric. It has the memoryless property. It is defined only for positive values. In an ANOVA analysis, if the between-group sum of squares SSB is 200 and the within group sum of squares is SSW 300, what is the total sum of squares SST?. 500. 100. 200. 600. For a poisson regression model, which goodness-of-fit-statistic would be inappropriate to asses the model´s adequacy?. Deviance. Pearson's chi-squared. Aklaike's Informatin Criterion (AIC). F-statistic. Why is the Pearson Chi-Square statistic appropriate for assessing the goodness-of-fit in a Poisson regression model?. It measures the proportion of variance explained by the model. It penalizes the model for having too many parameters. It compares the observed counts to expected counts under the model, adjusted for variance. It is used to select the best subset of predictors. Which criterion penalizes model complexity while rewarding goodness-of-fit, making it suitable for model comparison in Poisson regression?. R-squared (R²). Mean Absolute Error (MAE). Akaike Information Criterion (AIC). Sum of Squared Errors (SSE). Which statistic is commonly used to compare the goodness-of-fit between nested Poisson regression models?. Deviance. R-squared (R²). Mean Squared Error (MSE). Bayesian Information Criterion (BIC). Which method typically offers greater model interpretability?. Ensemble methods like Random Forest. Sequential methods like AdaBoost. Sequential methods like Gradient Boosting. Ensemble methods like Bagging. Which ensemble method builds models sequentially, with each model focusing more on the errors of the previous models?. Random Forest. AdaBoost. Bagging. Gradient Boosting. Which method provides feature importance scores to help interpret the contribution of each feature to the model?. Random Forest. AdaBoost. Bagging. Neural Networks. Which ensemble method uses bootstrapped subsets of the data to build multiple versions of a predictor and combines their predictions?. Random Forest. AdaBoost. Bagging. Neural Networks. Given a time series dataset, how would you determine if it is stationary?. Perform an augmented Dickey-Fuller test to check for stationary. Calculate the autocorrelation fuction ACF of the time series. Use a moving average to smooth out the time series data. Apply a seasona decomposition of time series STL. A researcher conducted an experiment to test the effect of three different diets on weight loss in a sample of 30 participants, divided equally among the three diest. After 8 weeks the weight loss for each participand was recorded. Using a one-way ANOVA, test if there is a significant difference in weight loss between the three diets. Which p-value would lead to reject the null hypothesis a 0.05 significance level?. 0.03. 0.05. 0. 0.07. Given the following small sample data for the heights (in cm) of a group of 5 plants [12,15,14,13,16] calculate the aprox. 95% confidence interval for the mean height. Assume the sample standard desviation is 1.58. [12, 16]. [13, 15]. [12.5, 15.5]. [11.5, 16.5]. If the third moment about the mean has a value of zero, what can we say about the distribution?. Symmetrical. Not symmetrical. Positevely skewed. None above. Predict the value of y when x=4, using the linear regression equation y=3x+1. 12. 13. 14. 11. When comparing the proportions of success between two independen samples, under what condition would the normal approximation method for the hypothesis test be innapropriate?. When both sample size are large. When the sample proportions are close to each other. When the product of the sample sizes and the pooled sample proportion is less than 5. When the significance level of the test is set to 5%. In a study, you need to test if theres is a monotonic relationship between two continuous variables. Which non parametric should you use?. Wilcoxon. Spearman. Chi-square. Kruskal-Wallis. A variable influences both the dependent and the independent variable, causing a spurious association is called _______ variable. Explanatory. Outcome. Confounding. Interfering. Which is a limitation when simulating probability distributions, such as binomial or geometric distributions?. Simulations can be only be performed for distributions with known analytical solutions. They cannot replicate the exact theoretical distribution but only approximate it. Simulations are less effective for distributions with a small number of possible outcomes. They always require extensive comutational resources and time. In a randome forest model, how s feature importance typically determined for variable selection?. By the depth which each feature is used to split the data in the trees. By the decrease in model accuracy when each feature is randomly permuted. by the increase in the tree depth when each feature is added. By the frecuency of each feature's appearance at the root of the trees. To control for confounding variables in a study, researchers should: Ignore the confounding variables. Ensure the confounding variables are evenly distributed between groups. Measure and adjust for the confounding variables in the analysis. Only use small sample sizes. In a study examining the relationship between coffee consumption and heart disease, smoking is a confounding variable because it: Is unrelated to coffee consumption. Only influences heart disease. Influences both coffee consumption and heart disease. Only influences coffee consumption. A confounding variable can give the false impression of a relationship between the independent and dependent variables because it: Influences both the independent and dependent variables. Is controlled in experimental studies. Influences the dependent variable only. Is the same as the dependent variable. Which of the following terms is NOT commonly used to describe a variable that influences both the dependent and independent variables?. Confounding variable. Interfering variable. Outcome variable. Explanatory variable. How is feature importance typically determined in a Random Forest model for variable selection?. By the number of missing values in the feature. By the decrease in model accuracy when each feature is randomly permuted. By the correlation coefficient between the feature and the target variable. By the size of the feature's data type. Why is the permutation method used to determine feature importance in Random Forest models?. It measures the direct impact of the feature on model performance. It relies on the linear relationship between features and the target variable. It only works for categorical features. It is unaffected by the number of features in the dataset. Which statistic tets is appropriate to determine if there is a significant difference between the means of two groups?. Paired t test. One sample test. Independent samples t test. Chi-square test. Which branch of statistics considers the ratio scale and interval scale?. Parametric. Non parametric. Sampling. Distribution. When comparing the average time spent on a website before and after a redesign, a p value of 0.03 is obtained. If the significance level was set at 0.05, which statement is correct?. The p value indicates strong evidence for the null hypothesis since its less than the significance level. The p value alone is sufficient to reject the null hypothesis, other factors must be considerated. The null hypothesis can be rejected in favor of the alternative, as the p value is less than the significance level. The p value suggests the website redesign had no effect since it is closed to the significance level. What is confidence interval referring to?. The range of values that the populuation parameter is expected to fall into. The range of values within the sample mean is guaranteed to be. The probability of a Type I error. The margin of error in a hypothesis test. Which method generally requires less training time?. Ensemble methods like Random Forest. Sequential methods like Gradient Boosting. Sequential methods like Bagging. Ensemble methods like AdaBoost. In machine learning, which algorithm is best suited for predicting a categorical target variable based on a set of independent variables?. Linear regression. K-nearest neighbours. Decision tree. Principal component analysis PCA. Which algorithm is best suited for predicting a continuous target variable based on a set of independent variables?. Decision Tree Classification. Regression Tree. Logistic Regression. Naive Bayes. What is a key advantage of using regression trees for predicting continuous variables?. They require the data to be normally distributed. They can handle both categorical and numerical independent variables without transformation. They are only suitable for small datasets. They cannot model non-linear relationships. Which of the following algorithms is not typically used for predicting continuous target variables?. Linear Regression. Support Vector Regression (SVR). K-Nearest Neighbors (KNN) Regression. Logistic Regression. Why are Random Forest Regression and Gradient Boosting Machines (GBM) effective for predicting continuous target variables?. They always produce linear models. They combine multiple models to improve prediction accuracy and reduce overfitting. They require less data than other models. They do not handle non-linear relationships well. In which scenario would you prefer to use Support Vector Regression (SVR) over Linear Regression?. When the relationship between the independent variables and the target variable is linear. When the dataset is very large. When there is a complex, non-linear relationship between the independent variables and the target variable. When interpretability of the model is more important than accuracy. Which algorithm is known for predicting continuous target variables by averaging the outcomes of multiple decision trees?. K-Nearest Neighbors (KNN). Random Forest Regression. Naive Bayes. Decision Tree Classification. What is one potential disadvantage of using K-Nearest Neighbors (KNN) for regression tasks?. It is not capable of handling non-linear relationships. It requires transformation of categorical variables into numerical form. It can be computationally expensive for large datasets. It produces models that are difficult to interpret. Which scenario is most appropriate for using a Chi-squared test?. Comparing the variance of two samples drawn from normal distributions. Testing the difference in means two independents groups. Assesing the association between two categorical variables. Evaluating the equality of more than two population variances. A dataset has a mean of 50, a median of 40, and a mode of 40. What does this suggest about the skewness of the dataset?. The dataset is positively skewed. The dataset is negatively skewed. The dataset is symmetric. The skewness cannot be determined with ths information. What is the effect of increasing the sample size on the margin of error and significance testing for a given population mean?. The margin of error decreases, and the test becomes more sensitive to detecting differences from the null hypothesis. The margin of error increases, and the test becomes more sensitive to detecting differences from the null hypothesis. The margin of error stay equal, and the test becomes more sensitive to detecting differences from the null hypothesis. The margin of error and test sensitivity are both unaffected by changes in sample size. What is the name of the phenomenon when during research people commonly lie about certain subjects?. Sampling bias. Responde bias. Cofounding. Non responde bias. Consider a continuous probability function f(x)=3x^2 for 0≤ x ≤1. What does the derivative of this function at x=0.5 represent?. The probability of the random variable being exactly 0.5. The rate of change of probability as the random variable changes around 0.5. The cumulative probability up to the point x=0.5. The mean value of the random variable at x=0.5. A sociologist is designing study to understand how the number of years of education (x) affects income (Y) while also considering age (z) as athird variable. The sociologist wants to explore the possibility that the effect of education on income varies with age. What type of analysis should they use to best capture this relationship?. A multivariate regression of income on education and age to account for the effect of age. A regression analysis with an interaction term between education and age. A simple linear regression of income on education, controling for age as a covariant. A hierarchical regression with education entered in the first step and age in the second. Two dataset X and Y, have a covariance of -5. Whar does this imply about their relationship?. X and Y move in the same direction, and the relationship is strong. X and Y move in the same direction, and the relationship is low. X and Y have no linear relationship. X increases as Y decreases, but the magnitude of their relationship is not known. A sociologist wants to study how the number of years of education (X) affects income (Y) while also considering age (Z) as a third variable. They suspect that the effect of education on income varies with age. What type of analysis should they use?. Simple Linear Regression. Multiple Regression without Interaction. Interaction Regression Analysis. Principal Component Analysis (PCA). Which of the following analyses is specifically designed to study how the relationship between two variables changes depending on the level of a third variable?. Multivariate Regression Analysis. Interaction Regression Analysis. Analysis of Variance (ANOVA). Cluster Analysis. When would a multivariate regression analysis be appropriate?. When you want to predict a single dependent variable based on multiple independent variables. When you want to predict multiple dependent variables simultaneously based on multiple independent variables. When you want to reduce the dimensionality of your dataset. When you want to classify observations into predefined groups. Which analysis technique is used to transform a large set of correlated variables into a smaller set of uncorrelated variables?. Interaction Regression Analysis. Principal Component Analysis (PCA). Logistic Regression. Analysis of Variance (ANOVA). Which technique would you use if you want to classify a series of observations into predefined groups based on multiple characteristics?. Principal Component Analysis (PCA). Multivariate Regression Analysis. Discriminant Analysis. Interaction Regression Analysis. In the context of regression analysis, what is an interaction term?. A term that represents the sum of two independent variables. A term that represents the product of two independent variables. A term that represents the difference between two independent variables. A term that represents the square of an independent variable. Which analysis method would you choose to segment a population into different demographic groups based on several characteristics?. Cluster Analysis. Simple Linear Regression. Multivariate Regression Analysis. Principal Component Analysis (PCA). When performing a multiple regression analysis without interaction terms, what are you assuming about the relationship between the independent variables and the dependent variable?. That the relationship between each independent variable and the dependent variable is independent of the other independent variables. That the relationship between each independent variable and the dependent variable depends on the other independent variables. That the independent variables are uncorrelated. That the dependent variable is categorical. Which of the following is a type of statitics?. Descriptive. Inferential. Industry. Descriptive and inferencial. Which of the following are the main types of statistics?. Descriptive and Inferential. Descriptive and Industrial. Inferential and Operational. Industrial and Operational. Which type of statistics is used to summarize and describe the features of a dataset?. Inferential. Descriptive. Predictive. Industrial. Which type of statistics is used to make predictions or inferences about a population based on a sample of data?. Inferential. Descriptive. Predictive. Industrial. Which of the following is not a main type of statistics?. Inferential. Descriptive. Both of Industrial, and descriptive. Industrial. Descriptive statistics include all of the following except: Mean. Standard deviation. Hypothesis testing. Frequency distribution. Inferential statistics typically involves which of the following?. Calculating the mean of a dataset. Describing the shape of a data distribution. Estimating population parameters. Summarizing data with graphs. What is the purpose of inferential statistics?. To collect data from every member of a population. To organize and summarize data. To make generalizations about a population based on a sample. To count the number of occurrences in a dataset. When drawing samples from a population with extreme outliers, which of the following is a true statement about the sampling distribution of the mean?. Outliers do not affect the sampling distribution if the sample size is large. Outliers will cause the sampling distribution to have a larger mean than the population mean. The presence of outliers will lead to a bimodal sampling distribution. Outliers can increase the variability of the sampling distribution, making it wider than it would be without outliers. How do outliers in a population affect the sampling distribution of the mean?. They make the sampling distribution narrower. They increase the variability of the sampling distribution, making it wider. They have no effect on the sampling distribution of the mean. They make the sampling distribution perfectly normal. What is the main reason that outliers increase the variability of the sampling distribution of the mean?. Because outliers are ignored in the calculation of the mean. Because outliers cause the mean of each sample to be closer to the median. Because outliers can significantly shift the mean of a sample, increasing the spread of sample means. Because outliers reduce the overall sample size. Which of the following machine learning algorithm is based on bagging?. Decision tree. Random-forest. Classification. Regression. What is the main purpose of bagging in machine learning?. To reduce bias by combining weak learners. To increase variance by using fewer data samples. To reduce variance by averaging multiple models trained on different data samples. To increase the complexity of the model. How does Random Forest use the bagging technique?. By training multiple SVMs on different subsets of the data. By combining multiple decision trees trained on bootstrap samples of the data. By building sequential models where each model corrects errors from the previous one. By finding the nearest neighbors for classification or regression tasks. Which of the following is NOT a characteristic of the Random Forest algorithm?. It reduces overfitting by averaging the results of multiple trees. It uses the boosting technique to sequentially train models. It selects random subsets of features for each tree. It is robust to overfitting and performs well with large datasets. Which of the following statements is true about bagging and boosting?. Both bagging and boosting reduce the variance of the model. Bagging reduces variance, while boosting reduces bias and variance. Boosting reduces variance, while bagging reduces bias. Both bagging and boosting increase the bias of the model. Which machine learning technique involves training multiple models sequentially, where each new model attempts to correct errors made by previous models?. Bagging. Boosting. Random Forest. K-Nearest Neighbors (KNN). Why is Random Forest considered to be robust against overfitting?. Because it uses simple models with low variance. Because it averages the predictions of multiple decision trees, reducing the overall variance. Because it uses boosting to improve model accuracy. Because it trains on the entire dataset without resampling. How would you handle categorical variables with many levels in a machine learning model?. Label Encoding. One-Hot Encoding. Target Encoding. Frequency Encoding. Which technique replaces each category with the mean of the target variable for that category?. Label Encoding. One-Hot Encoding. Target Encoding. Frequency Encoding. Which technique is particularly useful for reducing the dimensionality of categorical variables by learning dense representations of the categories in a low-dimensional space?. Label Encoding. One-Hot Encoding. Target Encoding. Embedding layers. Which encoding technique replaces each category with its frequency of occurrence in the dataset?. Label Encoding. One-Hot Encoding. Target Encoding. Frequency Encoding. What is a disadvantage of using One-Hot Encoding for categorical variables with many levels?. It introduces an arbitrary order between categories. It can generate a sparse and high-dimensional matrix. It may lead to overfitting. It is difficult to implement. What is a potential risk of using Target Encoding for categorical variables?. It may introduce an arbitrary order between categories. It can lead to overfitting if not handled properly. It generates a high-dimensional matrix. It does not capture the relationship between the category and the target variable. A logistic regression model was fitted to predict the likehood of a patient having a heart disease based on various predictors including age, cholesterol level, and blood pressure. The coefficient for age is 0.05 with a p-value of 0.01. What it means?. For every one-year increase the age, the odds of having heart disease increase by 5%. For every one-year increase the age, the log-odds of having heart disease increase by 0.05%. For every one-year increase the age, the probability of having heart disease increase by 5%. For every one-year increase the age, the patient is 0.05 times more likely to have heart disease. Which of these are limitations of the backpropragation rule?. Overfitting. Local minima problem. Computational complexity. All of them. What can happen if a neural network trained with backpropagation is not properly regularized?. It will always perform better on new data. It can suffer from underfitting. It can suffer from overfitting and poor generalization. It will require less computational power. Which problem in backpropagation is specifically related to recurrent neural networks (RNNs)?. Vanishing and exploding gradients. Lack of computational power. Insensitivity to input data. Overfitting. What is a common issue in training deep neural networks using backpropagation related to computational cost?. It is inexpensive and fast to train deep networks with backpropagation. It requires significant computational resources and time. It does not require a GPU or advanced hardware. It eliminates the need for large datasets. How does the need for large amounts of labeled data affect the backpropagation rule?. It allows the model to train faster. It ensures that the model will always generalize well. It can be a challenge when data is limited or expensive to obtain. It is not relevant to the effectiveness of backpropagation. Why is the backpropagation rule sensitive to weight initialization?. Because poor initialization can lead to slow convergence or getting stuck in suboptimal local minima. Because it requires very specific initial weights to function. Because it can only be used with pre-trained weights. Because it does not adjust weights during training. In the context of time series analysis, when two non-stationary series are suspected to be cointegrated, which test can be appropriately applied to test for cointegration?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. Which test is a multivariate approach to test for cointegration among multiple time series?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. Which test is used to test for a unit root in a time series and is robust to heteroskedasticity and autocorrelation?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. Which test is used to test the null hypothesis that an observable time series is stationary around a deterministic trend (trend-stationary)?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. The KPSS Test stands for: Kwiatkowski-Phillips-Schmidt-Shin. Kwiatkowski-Phillips-Silva-Smith. Kwiatkowski-Peron-Schmidt-Shin. Kwiatkowski-Phillips-Scott-Smith. Which test would you use to determine the number of cointegrating vectors among several non-stationary time series?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. Which test is commonly used to confirm the results of the Augmented Dickey-Fuller (ADF) test when checking for unit roots in a time series?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. Which test is performed by estimating a regression and then testing the residuals for stationarity?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. In the context of time series analysis, which test is used to test for stationarity in a series by examining the absence of unit roots?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. Which test involves two separate hypotheses tests: the trace test and the maximum eigenvalue test?. Engle-Granger Test. Phillips-Perron Test. Johansen Test. KPSS Test. In the Engle-Granger Test, what is the next step after estimating the regression between two non-stationary series?. Perform a Dickey-Fuller test on the original series. Perform a Dickey-Fuller test on the residuals of the regression. Calculate the autocorrelation function of the residuals. Estimate the regression between the residuals and a lagged version of themselves. What does it mean if the residuals from the Engle-Granger regression are found to be stationary?. The original series are non-stationary and not cointegrated. The original series are non-stationary but cointegrated. The original series are stationary. The original series are random walks. What is the primary purpose of the Engle-Granger Test in time series analysis?. To test for stationarity in a single time series. To test for autocorrelation in a time series. To test for cointegration between two non-stationary time series. To test for seasonal effects in a time series. In a multivariate OLS regression, what does an R-squared value close to 1 indicate?. The model perfectly predicts every point in the data set. A large proportion of the total variation in the dependent variable is explained by the model. The model's predictios are 100% accurate. All the independent variables are perfectly correlated with the dependent variable. Which of the following is NOT a measure of dispersion?. Range. Variance. Median. Standard deviation. Which of the following is a measure of central tendency?. Range. Variance. Median. Standard Deviation. Which of the following is NOT a measure of dispersion?. Range. Variance. Mean. Standard Deviation. Which measure of central tendency is calculated by adding all the data points together and then dividing by the number of data points?. Mode. Median. Mean. Range. Which measure of dispersion is defined as the difference between the maximum and minimum values in a dataset?. Variance. Standard deviation. Mean. Range. What does the standard deviation measure in a dataset?. The central value. The spread or dispersion of the data. The frequency of the most common value. The difference between the third and first quartile. The interquartile range (IQR) measures the spread of the middle 50% of a dataset. How is it calculated?. Difference between the maximum and minimum values. Difference between the first quartile (Q1) and the third quartile (Q3). Sum of all deviations from the mean. Average of the squared deviations from the mean. For a given sample, how would increasing the confidence level from 90% to 95% affect of the confidence interval for the population mean?. The width remains the same as the sample mean is unchanged. The width decreases as confidence level increases. The width increases as confidence level increases. The width changes in an unpredictible manner. Which of the following is a key assumption of the M/M/1 queing model?. The arrival rate follows a Poisson distribution. There are multiple servers. The service rate is dependent on the number of customers. Customers arrive at fixed intervals. In the machine learning, which method is used to prevent overfitting?. Featire engineering. Cross validation. Gradient descent. Bagging. If the correlation coefficient between two variables is 0.8, which of the following is true?. There is a perfect linear relationship between the variables. The variables have a strong positive linear relationship. 80% of the variation in one variable is explaine by the other. The variables will increase together 80€ of the time. in a study on the association between exercise frequency and the incidence of common colds, a 3x2 table is constructed with the rows representing three categories of exercise frequency (none, occasional, regular), and the columns representing the incidence of common colds (yes, no). If the chi-squared test for independence indicates a significant associatio, what can be concluded?. Exercising regularly causes a reduction in the incidence of common colds. There is a significant association between exercise frequency and the incidence of common colds. People who exercise occasionally are more prone to common colds than those who dont't exercise at all. There is no relationship between how often people exercise and whether they get colds. If a dataset has a large range, what does it imply the dataset?. It has a high standard deviation. It is highly skewed. It has a large number of outliers. It contains values that are widely spread out. Which of the following measures can help identify outliers in addition to the range?. Mean. Median. Box plot. Mode. What does a large range in a dataset primarily indicate?. The dataset has no variability. The dataset is homogeneous. The dataset has high variability. The dataset has no outliers. Which of the following statements is true about a dataset with a large range?. It always contains outliers. It may have values that are widely spread out. It has a low standard deviation. It has a small number of observations. In addition to the range, which other statistical measure can provide information about the spread of a dataset?. Mean. Variance. Median. Mode. For forecasting future volatility in financial time series, which model us tipically preferred due to its ability to capture long memoryin volatility?. ARIMA. GARCH. Exponential smoothing model adjusted for seasonality. Linear regression model with heteroskedasticyty. The mode of a frequency diagram can be determined from the: Histogram. Frequency polygon. Ogive. Bar chat. To find the cumulative frequency distribution, which graph would you use?. Histogram. Frequency polygon. Ogive. Bar chat. Which measure of central tendency is identified as the value that appears most frequently in a dataset?. Mode. Median. Range. Mean. What is an ogive primarily used for?. Determining the mode. Determining the median and percentiles. Showing the relationship between two variables. Displaying the range of the data. In the context of endogenety in a regression model, when is an instrumental variable considered valid?. When it is strongly correlated with the endogeneous explanatory variable, but does not correlate with the error term. When it is uncorrelated with the error term. When it is a determinant of the dependent variable. When it is correlated with other explanatory variables. What is the main purpose of using an instrumental variable in a regression model?. To reduce the sample size. To eliminate multicollinearity. To address endogeneity by providing consistent estimates. To increase the number of explanatory variables. When is an instrumental variable considered valid in the context of endogeneity in a regression model?. When it is weakly correlated with the endogenous explanatory variable but correlated with the error term. When it is strongly correlated with the endogenous explanatory variable and correlated with the error term. When it is weakly correlated with the endogenous explanatory variable and not correlated with the error term. When it is strongly correlated with the endogenous explanatory variable but not correlated with the error term. Which condition ensures that an instrumental variable is exogenous?. The instrumental variable is strongly correlated with the endogenous variable. The instrumental variable is not correlated with the error term. The instrumental variable is included in the regression model. The instrumental variable is correlated with the dependent variable. What is typically used to test the relevance of an instrumental variable in a two-stage least squares (2SLS) regression?. T-statistic. F-statistic. Z-statistic. P-value. What is a critical consideration when chosing the bandwidth in a kernel non-parametric regression?. Larger bandwidth always in a more accurate model. The bandwidth must be the same as the standard deciation of the dependent variable. Bandwidth choice affects the trade-off between bias and variance in the model. A universally optimal bandwidth can be used for all types of data. A company wants to analyze if the customer satisfaction ratings (ordinal data) differ before and after implementing a new service policy using a sample of the same customers. Which nonparametric test should they use?. Mann-Whitney. Kruskal-Wallis. Wilcoxon. Friedman. A test statistic falls in the critical region of a hypothesis test for a mean. Which of the following statements correctly interprets this result?. The test statistic's falling in the critical region confirms the null hypothesis with certainty. The result is statistically significant, suggesting the null hypothesis should be rejected. The p-value associated with the test statistic will be greater than the significance level. The sample mean is likely to be equal to the population mean specified in the null hypothesis. If a test statistic falls within the critical region in a hypothesis test with α=0.01, what does this imply about the p-value?. The p-value is greater than 0.01. The p-value is equal to 0.01. The p-value is less than 0.01. The p-value is equal to 1.00. The critical region of a hypothesis test is determined by: The sample size. The level of significance 𝛼. The test statistic. The null hypothesis. When performing a hypothesis test, what is the implication if the test statistic is outside the critical region?. Reject the null hypothesis. Fail to reject the null hypothesis. The result is statistically significant. The sample size should be increased. In hypothesis testing, what does a p-value less than 𝛼 indicate?. Accept the null hypothesis. Reject the null hypothesis. Fail to reject the null hypothesis. The test is inconclusive. Considering the sample statistic, if the sample statistic mean is not equal to population parameter then sample statistic is considered as. Unbiased estimator. Biased stimator. Interval estimator. Hypothesis estimator. Which of the following statements is correct?. The standar deviation of a constant is always equal to one. The sum of absolute deviations is minimized when these deviations are calculated from the mean. The second moment about origin is variance. The variance is non negative quantity and is expressed in square of the units of the observations. If an analyst suspects ARCH effects in the residual of an ARIMA model for stock returns, what should their next step?. Increase the order of the ARIMA model to account for the ARCH effects. Apply a Box-Cox transformation to stabilize the variance. Fit a GARCH model to account for the conditional heteroskedasticity. Perform seasonal differencing to remove the ARCH effects. Which of the following distributions is used to compare two variances?. T distribution. F distribution. Normal distribution. Poisson distribution. How is the F statistic calculated in an F-test for comparing two variances?. By taking the ratio of the two sample means. By taking the difference between the two sample variances. By taking the ratio of the two sample variances. By taking the sum of the two sample variances. In an F-test, if the calculated F statistic is greater than the critical value from the F distribution table, what is the appropriate conclusion?. Fail to reject the null hypothesis. Reject the null hypothesis. Accept the null hypothesis. The test is inconclusive. What is the null hypothesis when using an F-test to compare the variances of two populations?. The means of the two populations are equal. The medians of the two populations are equal. The variances of the two populations are equal. The distributions of the two populations are equal. In SVMs (Support Vector Machines), which method is not typically used for variable selection?. Recursive Feature Elimination (RFE). Lernel trick to transform features. Regularization parameters in the loss function. Using the weight magnitudes of the features. What is the name of former probabilities in Bayes Theorem which change based on new information?. Independent probabilities. Posterior probabilities. Interior probabilities. Dependen probabilities. The mean of a sample is. Always identical to the population mean. Always less than the population mean. Calculated by adding up data values dividing the total by (n-1). Determined by adding together all data values and dividing the total by the number of items. A researcher wants to test whether the distribution of a particular set of data devaites from a specified theoretical distribution. Which nonparametric test is most suitable for this purpose?. The friedman. Wilcoxon. Spearman. Kolmogorov. Which method is generally more prone to overfitting?. Bagging. AdaBoost. Random forest. Granding Boosting. In the linear regression, a good estimator of the standard deviation is the. Standard deviation. Variance. Standar error of the stimate. Sample standard deviation. In queuing theory, Little's law relates which three variables in a steady-state system?. Arrival rate, service rate, and number of servers. Number of customers in the system, arrival rate, and service rate. Number of customers in the system, arrival rate, and time spent in the system. Number of servers, arrival rate, and service time. Which of the following best describes the purpose of the residual standard error in regression analysis?. To measure the goodness of fit of the model. To compare different regression models. To estimate the standard deviation of the error term. To predict future values. In the context of regression, what does a smaller residual standard error indicate?. The model has a poor fit to the data. The residuals have high variability. The model has a good fit to the data. The residuals are biased. The degrees of freedom used to calculate the residual standard error in a linear regression model with n observations and k predictors is: n. n - 1. n - k. n - k - 1. How is the residual standard error calculated in linear regression? a) By dividing the sum of squared residuals by the total number of observations b) By taking the square root of the mean squared error c) By dividing the sum of squared residuals by the degrees of freedom and then taking the square root d) By calculating the standard deviation of the predicted values. By dividing the sum of squared residuals by the total number of observations. By taking the square root of the mean squared error. By dividing the sum of squared residuals by the degrees of freedom and then taking the square root. By calculating the standard deviation of the predicted values. In linear regression, a good estimator of the standard deviation of the errors is the: Mean absolute error. Residual standard error. Median absolute deviation. Coefficient of determination. In Little's law, what does the variable W represent?. Number of customers in the system. Arrival rate of customers. Service rate of the system. Average time a customer spends in the system. Which of the following statements is true about Little's law?. It only applies to systems with a single server. It requires the system to be in a steady state. It only applies to systems with no waiting time. It requires the arrival rate to be greater than the service rate. In a quadratic OLS regression model (inclusing X and X^2), what does the coefficient of x^2 represent?. The linear relationship between x and y. The average in Y for one-unit increase in x, holding X constant. The curvature of acceleration in the relationship between X and Y. The inverse relationship between X and Y. A variable whose values can take on an infinite number of values within a given of values is. Continuous. Discrete. Random. Both contonuos and discrete. Which of the following is an example of a continuous variable?. Number of children. Number of cars. Height of a person. Type of car. What is the key characteristic of a continuous variable?. It can only take whole number values. It can take an infinite number of values within a given range. It can only take a finite number of values. It can only take values in a fixed sequence. Which type of variable is described by measurements such as time, weight, and temperature?. Discrete variable. Continuous variable. Ordinal variable. Nominal variable. Which scale of measurement has a true zero point and allows for the comparison of absolute magnitudes?. Nominal. Ordinal. Interval. Ratio. In which scale of measurement would the temperature of 0 degrees not indicate the absence of heat?. Nominal. Ordinal. Interval. Interval. What is the main purpose of a chi-square test?. To examine the relationship between two category variables. To caompare means between two groups. To determine the correlation coefficient between two continuous variables. To predict the future values of a time series data. Which of the following is an appropriate use of a chi-square test?. Comparing the average height of men and women. Examining if there is a relationship between gender and voting preference. Testing if the mean income differs between two cities. Predicting weight based on height and age. When performing a chi-square test, what does a large chi-square statistic indicate?. The observed frequencies differ significantly from the expected frequencies. The observed frequencies are close to the expected frequencies. There is no relationship between the variables. The sample size is too small. Which is true?. The standard error is calculated only from sample attributes. The standard error is a measure of central tendency. What is the standard error of the mean used for?. Measuring the central value of a dataset. Measuring the spread of a dataset. Estimating the precision of the sample mean as an estimate of the population mean. Determining the mode of a dataset. What does a smaller standard error indicate about a sample statistic?. It is less precise. It is more precise. It has a larger variance. It has a larger mean. If the sample standard deviation is 10 and the sample size is 25, what is the standard error of the mean?. 2. 5. 10. 25. When do the conditional density functions get converted into the marginally density functions?. By integrating the conditional density function over the conditional variable. By summing up the conditional density functions. By differentiating the joint density function. By normalizing the conditional density function. For which of the following architecture systolic arrays is an example for?. MISD. SISD. SIMD. None above. In a systolic array, what is the primary characteristic of data flow?. Data flows in multiple streams simultaneously. Data flows through a single stream to multiple processors. Data remains static while processors move. Data flows randomly between processors. What type of applications commonly use systolic arrays?. Web browsing. Digital signal processing. Word processing. Database management. In MISD architecture, how do the processors operate on the data?. Multiple processors execute the same instruction on multiple data streams. Multiple processors execute different instructions on multiple data streams. A single processor executes multiple instructions on a single data stream. Multiple processors execute different instructions on a single data stream. Which of the following is NOT a characteristic of systolic arrays?. Efficient for applications requiring parallel data processing. Single data stream processed in stages. Regular, repetitive network of processors. Processors performing identical operations. What does SISD stand for in computer architecture?. Single Instruction, Single Data. Single Instruction, Synchronous Data. Simple Instruction, Single Data. Single Instruction, Simple Data. In SISD architecture, how many instructions and data streams are processed at a time?. Multiple instructions and multiple data streams. Single instruction and multiple data streams. Multiple instructions and single data stream. Single instruction and single data stream. What is a key characteristic of MISD architecture?. A single processor executing multiple instructions on multiple data streams. Multiple processors executing a single instruction on multiple data streams. Multiple processors executing different instructions on a single data stream. A single processor executing a single instruction on a single data stream. Which architecture is exemplified by traditional personal computers?. SISD. MISD. SIMD. MIMD. Which of the following architectures is the least common in practical use?. SISD. MISD. SIMD. MIMD. How does the Central Limit Theorem (CLT) relate to convergence in distributions?. The CLT demostrates convergence in probability, not in distribution. The CLT exemplifies convergence in distribution as sample means approximate a normal distribution. The CLT shows convergence in distribution as the sample size increases, the distribution of the sample means approaches a normal distribution. The CLT asserts that individual observations converge to a normal distribution as the sample size increases. Which of the following conditions is NOT necessary for the Central Limit Theorem to hold?. The samples must be independent. The sample size must be sufficiently large. The population must be normally distributed. The samples must be random. According to the Central Limit Theorem, what happens to the standard error of the mean as the sample size increases?. It increases. It decreases. It remains the same. It becomes equal to the population standard deviation. The Central Limit Theorem is important because it allows us to: Assume any sample distribution is normal. Use the normal distribution to approximate the sampling distribution of the sample mean. Calculate the exact mean of the population. Assume the population distribution is normal. If a population has a mean 𝜇=100 and a standard deviation 𝜎=20, what is the approximate standard error of the mean for a sample size of 25?. 4. 5. 2. 10. What does the Central Limit Theorem state?. If the sample size increases, the sampling distribution will approach normal distribution. As the sample size increases, the sampling distribution will tend towards resembling an exponential distribution. If the sample size decreases, the sampling distribution will tend towards resembling a normal distribution. As the sample size decreases, the sampling distribution will tend towards resembling an exponential distribution. How would you join the following two datasets? The first is a list of all inhabitants of a small town with their legal addresses and personal identification numbers (each unique and individual). the second one is a list of traffic violations, including personal identification numbers and one observation per infraction (meaning that a person have several observations in this dataset)?. A one to one merge. A one to many merge. A many to one merge. There is no way. What is a one-to-one merge?. A merge where each record in one dataset corresponds to a single, unique record in another dataset. A merge where each record in one dataset corresponds to multiple records in another dataset. A merge where all records are combined regardless of matching keys. A merge where no records are combined. In a one-to-many merge, how are the datasets combined?. Each record in one dataset corresponds to a single, unique record in another dataset. Each record in one dataset can correspond to multiple records in another dataset. Records are merged without considering the matching keys. Only the first record is considered for merging. Which of the following is a key characteristic of a many-to-one merge?. Multiple records in one dataset correspond to a single, unique record in another dataset. Each record in both datasets corresponds uniquely to each other. Records are combined randomly. Only unmatched records are combined. When would you typically use a one-to-one merge?. When merging datasets with a unique key for each record in both datasets. When merging datasets where one dataset has multiple records for each key in the other dataset. When the datasets have no common key. When you want to combine records based on their order rather than a key. In a dataset of students and another dataset of courses they are taking, which type of merge is most appropriate?. One-to-one merge. One-to-many merge. Many-to-one merge. None of the above. What is a key difference between a one-to-one merge and a one-to-many merge?. One-to-one merge results in fewer records than a one-to-many merge. One-to-one merge does not require a common key. One-to-many merge can result in duplicated records from the dataset with unique keys. One-to-many merge only works with numeric data. In the context of merging datasets, what is a "primary key"?. A field used to uniquely identify each record in a dataset. A field that can be repeated across multiple records. A field that is not necessary for merging datasets. A field that is always numeric. If you have a dataset of employees and another dataset of their multiple attendance records, which merge type should you use?. One-to-one merge. One-to-many merge. Many-to-one merge. Many-to-many merge. What is the main benefit of using a one-to-one merge?. It simplifies the dataset by avoiding duplicate records. It can handle datasets with no common keys. It ensures all possible combinations of records are included. It is faster than other types of merges. Which of the following scenarios best illustrates the use of a many-to-one merge?. Combining a list of customers with their orders, where each customer can have multiple orders. Combining a list of products with a single supplier. Combining two lists of transactions occurring at the same time. Combining two datasets with no common keys. A large sample is drawn from a population with a known mean and standard deviation. According to the CLT, which of the following statements is true regarding the sampling distribution of the sample mean?. The sampling distribution of the sample mean will have the same standard deviation as the population. The sampling distribution of the sample mean will be approximately normally distributed, regardless of the population distribution. The sampling distribution of the sample mean will have a mean that is higher than the population mean. The sampling distribution of the sample mean will is not affected by the size of the sample. Which of the following is an absolute measure of dispersion?. Coefficient of skewness. Coefficient of dispersion. Standard deviation. Coefficient of variation. Which of the following is an absolute measure of dispersion?. Coefficient of Variation. Range. Correlation Coefficient. Gini Coefficient. Which measure of dispersion is defined as the average of the squared deviations from the mean?. Range. Variance. Median Absolute Deviation. Interquartile Range. Which of the following measures the spread of the middle 50% of a dataset?. Range. Variance. Interquartile Range. Standard Deviation. When the level of significance of a hypothesis test is increased, the probability of committing a Type I error. Decreases. Increases. Remains the same. Becomes zero. Which statistical method is best suited for identifying the relationship between 2 continuous variables?. Chi square. Pearson correlation. One way anova. Mann whitney. Which of the following is an assumption of linear regression?. The dependent variable is categorical. The residuals are normally distributed. There is multicollinearity among the independent variables. The relationship between the dependent and independent variables is non linear. What does it imply for two time series variables if they are found to be integrated?. Both series can be described by a single integrated moving average model. The series share common tren but do not necessarily move together in the short run. There is no long term equilibrium relationship between series. The series have a long term equilibriium relationship, and deviations from this equilibrium are temporary. What does multicollinearity in a linear regression model refer to?. High correlation between the predictors and the response variable. High correlation between two or more predictor variables. Low correlation between all predictor variables. Low variance of the residuals. Given the following small dataset, calculate the slope of the regression line for the relationship between x and y: x [1,2,3] y [2,4,6]. 1. 0. 2. -1. A farmer is investugating the effect of 3 different fertilizers on plant growth. The height of plants in cm, after 4 weeks is measured. The one-way anova resulted in an F-statistic of 12.45: What conclusion can be drown in the critical F-value at the 0.05 significance level is 3.89?. c There is no significant difference between the groups. a and b. a We can reject the null hypothesis and say that there is a significance relationship. b There is a significant difference between at least two of the groups. In the context of ANOVA, what does a significant F-statistic indicate?. The variances within groups are significantly different. The variances between groups are significantly different. At least one group mean is significantly different from the others. All group means are significantly different from each other. What is the null hypothesis in a one-way ANOVA?. All group means are equal. All group variances are equal. The sample means are different. The total mean is equal to the grand mean. A researcher is analysing a dataset containing the number of times individuals from different households have contacted customer service in the past year. Which model is most appropriate for analyzing these count data?. Linear regression, because the outcome variable is continuous. Linear regression, because the outcome variable is discrete. Poisson regression, because the outcome variable is count. Binomial negative regression. What does overdispersion in count data indicate?. The mean is greater than the variance. The variance is greater than the mean. The mean is equal to the variance. There is no relationship between the mean and variance. Which model should be used if the count data do not exhibit overdispersion?. Negative binomial regression. Poisson regression. Logistic regression. Probit regression. What additional parameter does the negative binomial regression model include to account for overdispersion?. An interaction term. A dispersion parameter. A quadratic term. A log-transformation. In the context of count data modeling, when would you prefer a negative binomial regression over a Poisson regression?. When the count data have equal mean and variance. When the count data have variance less than the mean. When the count data follow a normal distribution. When the count data have variance greater than the mean. For count data, which distribution is typically assumed by the Poisson regression model?. Normal distribution. Binomial distribution. Poisson distribution. Exponential distribution. If a researcher finds that the variance of the count data is much larger than the mean, which model should they consider using?. Poisson regression. Linear regression. Negative binomial regression. Logistic regression. Which model would be most appropriate if the count data exhibit overdispersion?. Linear regression. Poisson regression. Binomial negative regression. Logistic regression. What assumption does the Poisson regression model make about the relationship between the mean and variance of the count data?. The mean is greater than the variance. The mean is less than the variance. The mean is equal to the variance. There is no relationship between the mean and variance. In what scenario is a GARCH (generalized autoregressive conditional heteroskedaticity) model most appropriately applied?. When the time series data shows a consistent variance over time. When the focus of the analysis is on long-term trends in the mean of the series. When the time series is stationary and exhibits no volatility clustering. When the time series exhibits time-varying volatility or volatility clustering. Which of the following is not an assumption of ANOVA (Analysis of variance)?. The groups have similar variances (homogeneity). The data in each group are normally distributed. The sample are independent of each other. The dependent variable is measured at an interval scale. Which Chi-square distribution looks most like a normal distribution?. 2 degrees of freedom. 4 degrees of freedom. 8 degrees of freedom. 16 degrees of freedom. As the degrees of freedom increase, the chi-square distribution: Becomes more positively skewed. Becomes more negatively skewed. Becomes more symmetric and approaches a normal distribution. Remains unchanged. For a chi-square distribution with 2 degrees of freedom, the shape of the distribution is: Symmetric. Positively skewed. Negatively skewed. Uniform. Which of the following is true for the chi-square distribution as the degrees of freedom increase?. The mean and variance decrease. The mean increases but the variance decreases. The mean and variance increase. The mean decreases but the variance increases. At approximately how many degrees of freedom does the chi-square distribution become nearly indistinguishable from the normal distribution?. 2. 30. 15. 20. What happens to the sampling distribution of the sample mean when the sample size increases?. The distribution becomes narrower and more normal. The distribution becomes wider. The distribution becomes more skewed. The mean of the distribution increases. According to the Central Limit Theorem, what is the shape of the sampling distribution of the sample mean for large sample sizes?. Uniform. Normal. Skewed. Bimodal. What is the impact of increasing the sample size on the standard error of the mean?. increases. decreases. remains the same. becomes equal to the population standard deviation. Why is the Central Limit Theorem important in statistics?. It allows us to use the normal distribution to make inferences about the population mean from the sample mean. It states that the population distribution is always normal. It requires that the sample mean equals the population mean. It guarantees that all samples are identical. What is the relationship between the level of significance (𝛼) and the probability of a Type I error?. α is the probability of a Type I error. α is the probability of a Type II error. α is the power of the test. α is the standard error of the mean. What happens to the standard error of the mean if the sample size increases?. It increases. It decreases. It remains the same. It becomes zero. Which of the following is a Type II error?. Accepting a true null hypothesis. Rejecting a true null hypothesis. Accepting a false null hypothesis. Rejecting a false null hypothesis. Which of the following is a Type I error?. Accepting a true null hypothesis. Rejecting a true null hypothesis. Accepting a false null hypothesis. Rejecting a false null hypothesis. When testing the hypothesis that the proportion of online transactions using mobile devices exceeds 50%, the null hypothesis is set as the proportion being 50% or less. What constitutes a Type error I in this context?. Failing to reject the null hypothesis when the true proportion is actually greater than 50%. Rejecting the null hypothesis when the true proportion is exactly 50%. Accepting the null hypothesis when the true proportion of online transactions is less than 50%. jecting the null hypothesis when the true proportion is 50% or less. Given a dataset of customers reviews, how would you perform sentiment analysisto classify the reviews as positive or negative?. NLP. Clustering. K means. Calculate the coeficient. Which of the following reduces the risk of a Type I error?. Decreasing the significance level. Increasing the significance level. Decreasing the sample size. Increasing the sample size. A data set shows the following characteristics: a ñean greater than the median, with a positively skewed distribution. What does this indicate about the data?. Most values are clustered around the lower end of a range, with few high values. There are more high values than low values, which pulls the mean up. The median is less reliable than the mean as a measure of central tendency in this case. The data likely contain outliers or extreme values on the higher end of the scale. What can be inferred if the mean of a dataset is significantly higher than the median?. The dataset is likely normally distributed. The dataset is likely positively skewed. The dataset is likely negatively skewed. The dataset has no skewness. Which measure of central tendency is most affected by extreme values in a dataset?. Median. Mode. Range. Mean. What can be inferred if the mean of a dataset is significantly higher than the mediannn?. The dataset is likely normally distributed. The dataset is likely positively skewed. The dataset is likely negatively skewed. The dataset has no skewness. In a positively skewed distribution, which of the following statements is true?. Most of the data values are concentrated on the higher end. The mode is greater than the mean. The median is greater than the mean. The tail of the distribution extends to the right. In non parametric regression, what role does kernel smoothing play?. It assigns equal weight to all data points in the dataset. It determines the exact functional form of the relationship between variables. It helps in reducing the noise of the data by avering nearby observations. It is used to transform the dependent variable. What does PEAS stand for in artificial intelligence?. Performance, Environment, Actuators, Sensors. Performance, Efficiency, Algorithms, Systems. Predictive, Environment, Actuators, Systems. Precision, Environment, Algorithms, Sensors. In the context of a self-driving car, which of the following would be considered an actuator?. GPS system. LIDAR sensor. Steering wheel control system. Road signs. Which of the following is a common choice for a kernel function?. Linear kernel. Gaussian kernel. Polynomial kernel. Sigmoid kernel. In kernel smoothing, which of the following is true about the weights assigned to observations?. All observations are given equal weight. Observations further from the point of interest are given more weight. Observations closer to the point of interest are given more weight. Weights are assigned randomly to observations. What happens if the bandwidth in kernel smoothing is set too large?. The resulting estimate will be very detailed and noisy. The resulting estimate will be oversmoothed and may miss important features. The resulting estimate will fit a polynomial of high degree. The resulting estimate will not change. Which parameter in kernel smoothing controls the degree of smoothing applied to the data?. Degree of the polynomial. Number of observations. Bandwidth. Kernel function. An analyst performs two separated significance tests: one for a mean with a known popiulation standard deviation and one for a mean with an unknown population standard deviation. Both samples are of the same size and from normally distributed populations. How do the test statistics compare?. The test statistic will be larger for the known standar deviation due to less variability in the stimate. The test statistic for the unknown standar deviation uses the sample standard deviation and follows a t-distriibution, typically yielding a larger critical value fir the same significance level. The z-test and t-test statistics will be identical. The t-test statistic will always be larger than the z-test statistic. In a study on the impact of environmental factors on plant growth, a biologist presents a correlation matrix includiing sunlight hours, soil acidity (pH) and plant height. The correlation between sunlight hours and plant height is 0.8; between soil acidity and plant height is -0.6 and between sunlught and soil acidity is -0.7. What these correlations imply about the multicollinearity in a multiple regression model predicting plant height from both sunlight hours and soil acidity?. There is no multicollinearity in the model. There is significant multicollinearity that may affect the model's coefficients, theres is variables correlated between them. The correlation between sunlight and plant height suggests multicollinearity, but it's not significant. The correlation between soil acidity and plant height suggests that multicollinearity is not an issue. When using Ordinary Least Sqares (OLS) to model a nonlinear relationship between variables X and Y. What is a common approach?. Transofrming the dependent variable Y using a logartithmic function. Using a higher degree polynomial for the independent variable X. Increasing the sample size to better capture te nonlinearity. Applying a non paramtric method instead OLS. Which of the following best describes a polynomial regression model?. A model that uses a linear combination of predictor variables to estimate the response variable. A model that includes exponential terms of the predictor variable. A model that includes terms of the predictor variable raised to different powers x^2 and x^3 to model nonlinear relationships. A model that uses logarithms of the predictor variable to estimate the response variable. When would you consider using polynomial regression instead of simple linear regression?. When the relationship between the predictor and response variable is linear. When the relationship between the predictor and response variable is not linear and shows curvature. When the predictor variable is categorical. When the predictor variable has a large number of missing values. Which term would you add to a simple linear regression model to convert it into a quadratic polynomial regression model?. An interaction term between two predictor variables. The reciprocal of the predictor variable 1/x. The square of the predictor variable x^2. A logarithmic term of the predictor variable. What is a potential drawback of using higher-degree polynomials in a regression model?. The model becomes more linear. The model might overfit the data, leading to poor generalization to new data. The model becomes more computationally efficient. The model automatically handles multicollinearity. In polynomial regression, how does the inclusion of higher-order terms x^3, x^4... affect the model?. It makes the model more flexible, allowing it to capture more complex relationships between the predictor and response variables. It always improves the accuracy of the model. It reduces the degrees of freedom in the model. It simplifies the interpretation of the model coefficients. If a quadratic regression model (i.e., including x and x^2) has a high degree of multicollinearity between x and x^2, what might be a consequence?. The model will have a perfect fit to the data. The coefficients will be easier to interpret. The model will have less bias. The coefficients of x and x^2 could be unstable and difficult to interpret. When interpreting a polynomial regression model, what does a significant coefficient for the x^2 term indicate?. There is a linear relationship between X and Y. There is not a linear relationship between X and Y. There is a nonlinear (curved) relationship between X and Y. That X and Y are independent variables. In polynomial regression, why is it important to center the predictor variable (subtract the mean from each value) before calculating higher-order terms?. To make the model more nonlinear. To reduce multicollinearity between the terms. To simplify the calculation of the coefficients. To eliminate the need for a constant term. A researcher has calculated the Pearson correlation coefficient between the number of ours students study per week (x) and the califications (y) and obtained a value of 0.45. Which of the following statements is correct based on this correlation coefficient?. Increasing hours by one hour will increase califications by 0.45 point. There is moderate positive linear relationship between study hours and califications. 45% of the variation in the califications is explained by the number of study hours. If a student does not study at all, their califications would be an interval of 0. If the sample size used to estimate a population mean is quadrupled, assuming the population standard deviation is known and remains constant, how does this affect the width of a 95% confidence interval for the population mean?. The width is halved. The width is doubled. The width is cut by a factor of 3-4. The width remains constant. An experiment was conducted to compare the effectiveness of four different exercise programs on improving flexibility. felixibility scores were recorded for 20 participants, with five participants in each program. The one-way ANOVA test resulted in a p-value of 0.04. What does this p-value indicate at the 0.05 significance level?. Fail to reject the null hypothesis. Find the mean score of each program. Compare the p-value to the critical F-value. Reject the null hypothesis and conclude significant differences. Given the following data set of 5 numbers [8, 12, 14, 16, 18], what is the median of the data set?. 10. 15. 13. 14. Given the following data set of 5 numbers [8, 12, 14, 16, 18], what is the mean of the data set?. 10.7. 14. 13.6. 14.1. Given the following data: 3, 7, 3, 9, 5, 3, 7, 5, 5, what is the mode?. 5. 3. 7. Bimodal. Question: The mode is a measure of central tendency that identifies the most frequently occurring value in a dataset. Given the following data: 3, 7, 3, 9, 5, 3, 7, 5, what is the mode?. 3. 5. 7. 9. In a dataset, if a single extreme outlier is added, which measure of central tendency is likely to be affected the most?. Mean. Median. Mode. All. If a dataset has two modes, which of the following terms best describes it?. Unimodal. Bimodal. Multimodal. Non-modal. Which of the following statements about the mode is incorrect?. The mode can be used with both numerical and categorical data. The mode is the only measure of central tendency that can be used with nominal data. The mode is sensitive to extreme values. All of them. In which of the following situations is the mode the most appropriate measure of central tendency?. When the data is skewed. When the data has outliers. When dealing with categorical data. When all the data values are different. What does a high coefficient of variation (CV) indicate about a dataset?. The mean is close to zero. The data points are clustered around the mean. The standard deviation is small relative to the mean. There is considerable variability in data relative to the mean. What does a high coefficient of variation (CV) indicate about a dataset?. The data values are tightly clustered around the mean. The data values are widely spread out relative to the mean. The dataset has a low standard deviation. The dataset has a low mean. The coefficient of variation (CV) is a useful statistic because it: Measures the absolute variability of a dataset. Compares the degree of variation from one dataset to another, regardless of the units of measurement. Provides a measure of central tendency. Is unaffected by changes in the scale of the data. Which of the following statements is true about the coefficient of variation (CV)?. A lower CV indicates a higher level of relative variability in the data. CV is expressed as a ratio without any units. A higher CV means the data values are more consistent. CV can only be used with datasets that have a positive mean. If the coefficient of variation (CV) for Dataset A is 20% and for Dataset B is 40%, what can be inferred?. Dataset A has higher relative variability than Dataset B. Dataset B has higher relative variability than Dataset A. Both datasets have equal relative variability. The absolute variability in Dataset A is higher than in Dataset B. In which of the following situations would using the coefficient of variation (CV) be most appropriate?. Comparing the consistency of two different investment portfolios. Measuring the spread of a single dataset. Determining the central tendency of a dataset. Comparing two datasets with different units of measurement. A high coefficient of variation (CV) in a dataset typically suggests: High consistency of the data points around the mean. High variability relative to the mean. Low variability relative to the mean. A low standard deviation relative to the mean. How does non-parametric regression typically handle outliers compared to parametric regression?. Non-parametric regression is more sensitive to outliers than parametric regression. Non-parametric regression ignores outliers completely, making it less accurate. Non-parametric regression handles outliers better because it relies on local data points and does not assume a specific distribution. Non-parametric regression and parametric regression handle outliers in the same way. Given a continuous pdf f(x)= 2x for 0 ≤ x ≤ 1 ; What is then integral of f(x) from 0.25 to 0.75?. The median of a random variable between 0.25 and 0.75. The variance of a random variable between 0.25 and 0.75. The cumulative probability from 0.25 to 0.75. The expected value of the random variable between 0.25 to 0.75. You are given a dataset with missing values. How would you handle the imputation of these missing values to ensure the most accuracy results?. Perform a simple mean imputation for missing values. Use a linear regression model to predict the missing values. Use the k-nearest neighbours (KNN) algorithm to impute missing values. Apply a principal component anañysis PCA to impute missing values. Given a time series dataset with a clear seasonal pattern, how would you forecast future values?. Use a cimple moving average to forecast future values. Apply seasonal descomposition of time series(STL) to descompose the data into trend, seasonal, and residual components. Enform exponential smoothing to predict future values. Apply a Holt-winters exponential smoothing value. What is the first step you should take when forecasting future values in a time series with a clear seasonal pattern?. Apply a machine learning model. Decompose the time series to understand its components. Immediately apply the ARIMA model. Perform a stationarity test. Which of the following tests is commonly used to check the stationarity of a time series?. Durbin-Watson test. Augmented Dickey-Fuller (ADF) test. Granger causality test. Augmented Dickey-Fuller (ADF) test. If a time series shows both trend and seasonal components, which forecasting method is most appropriate?. Simple Linear Regression. Holt-Winters Exponential Smoothing. Moving Average. Linear Smoothing. In a Seasonal ARIMA (SARIMA) model, what does the parameter "P" represent?. The number of seasonal differencing steps. The number of non-seasonal autoregressive terms. The number of seasonal autoregressive terms. The number of non-seasonal moving average terms. Which model would you use if you want to capture both trend and seasonality without assuming a fixed functional form for these components?. Seasonal Decomposition of Time Series (STL). Simple Exponential Smoothing. Autoregressive Integrated Moving Average (ARIMA). Random Walk. What is the purpose of using cross-validation in time series forecasting?. To minimize overfitting by evaluating model performance on unseen data. To ensure that all seasonal cycles are included in the training data. To test for stationarity in the time series. To decompose the time series into its components. When should you apply seasonal differencing to a time series?. When the time series is already stationary. When the time series has a linear trend but no seasonality. When the time series exhibits seasonal patterns that need to be removed. When the time series is multimodal. Which of the following is an advantage of using machine learning models, like Random Forest or LSTM, for time series forecasting?. They are less computationally intensive than traditional statistical models. They require no preprocessing of the data. They can capture complex non-linear patterns and interactions between variables. They automatically remove seasonality from the data. When fitting a time series model, why is it important to retain a portion of the data as a test set?. To ensure that all possible models can be tested. To allow for parameter tuning in the model. To evaluate the model’s performance on data not used in training, ensuring better generalization. To decompose the series into trend, seasonal, and residual components. What is a common method for visualizing the accuracy of a time series forecast?. Scatter plot of actual vs. predicted values. Histogram of residuals. Time series plot of forecasted vs. actual values. Box plot of the dataset. How would you identify and remove outliers in a dataset?. Using a simple moving average to identify ouliers. Perform a principal componen analysis PCA to detect the outliers. Apply the IQR (Interquartil range) method to detect and remove outliers. Calculate the correlation coefficient to identify outliers. What can you do in a dataset with outliers?. Remove, transform, use robust models and impute. Tranform, clustering, use robust models and output. Use robust models, erase, impute, and remove. Clustering, erase, impute and use robust models. Which of the following methods can be used to visually identify outliers in a dataset?. Z-score. Box plot. IQR method. Isolation Forest. Which of the following methods can be used to transform data to reduce the impact of outliers?. Log transformation. Principal Component Analysis (PCA). Standardization. Min-Max Scaling. When might you choose to impute outliers rather than remove them?. When the outliers are due to measurement errors. When the outliers represent rare but valid phenomena. When you have a very large dataset. When you are using a linear regression model. Which of the following is a characteristic of the Isolation Forest method for detecting outliers?. It clusters data points into groups and identifies outliers as points that do not belong to any cluster. It uses random forest regression to predict the values of outliers. It isolates observations by randomly selecting a feature and then a split value, requiring fewer splits for outliers. It uses a Z-score threshold to identify outliers. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is particularly useful for identifying outliers because: It automatically removes outliers based on their distance from the mean. It assigns outliers to the closest cluster centroid. It identifies outliers as points that do not belong to any cluster, based on the density of data points. It normalizes data before clustering to minimize the impact of outliers. When might you choose to remove outliers from a dataset?. When outliers are suspected to be data entry errors. When outliers contribute valuable information to the analysis. When outliers have no significant impact on the analysis results. When outliers are spread across the dataset evenly. Which transformation method is commonly used to reduce the impact of outliers in a dataset?. Z-score transformation. One-hot encoding. Log transformation. Principal component analysis. Winsorization is a technique used to handle outliers by: Removing all data points that exceed a certain threshold. Replacing outliers with the nearest non-outlier values. Transforming the data to a normal distribution. Ignoring outliers during model training. In which situation might you choose to impute outliers rather than removing them?. When the outliers are likely to be errors in data entry. When the dataset is very large and removing outliers has little impact on the analysis. When outliers are normally distributed. When outliers represent meaningful data points that reflect rare but important events. Which of the following methods can be considered a robust model that is less sensitive to outliers?. Ordinary Least Squares (OLS) Regression. Median Regression. K-Means Clustering. Simple Linear Regression. After removing outliers, what is a critical next step to ensure the effectiveness of the outlier handling process?. Ignore the distribution of the remaining data and proceed with the analysis. Re-check the distribution to ensure that the data has the desired properties without the outliers. Immediately fit the final model without any additional checks. Reintroduce the outliers to verify their impact. Why is it important to evaluate model performance after handling outliers?. To confirm that the removal or transformation of outliers did not distort the model. To determine if the outliers can be used to create a separate model. To check if the outliers have been perfectly removed. To reintroduce the outliers into the final model for better accuracy. Which of the following statements is true about a discrete distribution?. It can take any value within a given range. It is described by a probability density function (pdf). It can only take a finite or countable number of specific values. The probability of any specific value is 0. Assuming that a sample size is greater than or equal to 30 then the sample standard deviation can be approx. to population standard deviation for the... Law of Large Numbers. Uniform distribution assumption. Small sample size condition. Central Limit Theorem. Which of the following distributions is continuous?. Binomial. F. Poisson. Hyper-geometric. Which of the following distributions is an example of a continuous distribution?. Binomial distribution. Poisson distribution. Hypergeometric distribution. Normal distribution. In a continuous probability distribution, what is the probability of a random variable taking an exact value?. Equal to the value of the probability density function at that poin. Depends on the distribution. 0. 1. Which type of distribution is described using a probability density function (pdf)?. Discrete distribution. Continuous distribution. Both discrete and continuous distributions. Neither discrete nor continuous distributions. What is the main difference between a discrete and a continuous random variable?. A discrete random variable can take on any value, while a continuous random variable can only take on whole numbers. A discrete random variable can only take on specific values, while a continuous random variable can take any value within a given range. A continuous random variable is always normally distributed, while a discrete random variable is not. There is no difference between a discrete and a continuous random variable. Which of the following is an example of a situation where a continuous distribution would be more appropriate than a discrete distribution?. The number of students in a classroom. The number of cars passing through a toll booth in an hour. The height of students in a classroom. The number of defective items in a batch. For a discrete probability distribution, which of the following must be true?. The sum of the probabilities of all possible outcomes is less than 1. The sum of the probabilities of all possible outcomes is greater than 1. The sum of the probabilities of all possible outcomes is equal to 1. The probabilities can be negative. If a random variable follows a Poisson distribution, which of the following statements is true?. The variable can take any value between 0 and 1. The variable can take any value between -∞ and +∞. The variable can take only non-negative integer values. The variable must take a value that is a multiple of 2. Which of the following is true for a continuous random variable?. The area under the probability density function (pdf) curve over any interval gives the probability of the variable falling within that interval. It can only take whole number values. The cumulative probability distribution is a step function. The sum of probabilities for all possible outcomes is equal to 1. What is the normal distribution symmetric around?. variance. mean. standard deviation. covariance. What is the total area under a continuous one variable probability distribution density function's graph?. 1. 0. Infinity. Depends on the type cumulative density function. What is the normal distribution symmetric around?. median. mean. standard deviation. mode. Assuming that a sample size is greater than or equal to 30 then the sample standard deviation can be approx. to population standard deviation for the... Known. Unknown. Standard interval deviation. Population interval theorem. When conducting a principal component analysis (PCA), how is the proportion of variance explained by each principal component determined?. By the eigenvalues corresponding to each principal component. By the eigenvectors corresponding to each principal component. By the sum of squared errors. By the cumulative variance. What is the purpose of cross-validation in machine learning?. To split the dataset into a training and testing set. To ensure the model does not overfit the data. To identify outliers in the dataset. To apply the model to new data. A researcher conducts a two-sample t-test and finds a p-value of 0.07. If the significance level is set at 0.05, what should the researcher conclude?. Reject the null hypothesis. Fail to reject the null hypothesis. Accept the alternative hypothesis. Increase the sample size. In the context of hypothesis testing, what does a p-value represent?. The probability of the null hypothesis being true. The probability of obtaining the observed data given that the null hypothesis is true. The probability of making a Type II error. The probability of the alternative hypothesis being true. Which of the following tests is used to compare the means of more than two independent groups?. Paired t-test. Independent samples t-test. ANOVA. Chi-square test. In an ANOVA test, if the between-group variance is much larger than the within-group variance, what does this indicate?. The group means are likely different from each other. The group means are likely similar. There is a high likelihood of Type I error. The sample size is too small. Which of the following is a property of a normal distribution?. It is skewed to the right. It has a mean of zero and a standard deviation of one. It is bimodal. It has thicker tails than a standard normal distribution. A dataset shows evidence of both trend and seasonality. Which time series model should be used to account for these features?. ARIMA. Holt-Winters Exponential Smoothing. Logistic Regression. Logistic Regression. When applying the Central Limit Theorem, which of the following conditions is most important for the theorem to hold true?. The population from which the sample is drawn must be normally distributed. The sample mean must equal the population mean. The sample size must be large. The sample mean must unequal the population mean. Which of the following is true about a negatively skewed distribution?. The mean is less than the median. The mean is greater than the median. The distribution is symmetric. The tail is on the right side. When is the use of the Bonferroni correction recommended?. When performing multiple hypothesis tests to control for Type I error. When the sample size is small. When the data is not normally distributed. When comparing more than two groups. When should you use a chi-square test of independence?. When comparing the means of two independent groups. When assessing the relationship between two categorical variables. When testing for a difference in proportions between two groups. When analyzing the correlation between two continuous variables. |