If we observe data is only collected during daytime, and not during night (always): It is an example of Missing Completely At Random (MCAR) It is an example of Missing at random (MAR) It is an example of Missing not at random (MNAR). If, after using min-max normalization (0-1) we get a value near to 0, it means that: It is not possible to known. The original value was near to the minimun value of the data. The original value was near to the average value of the data. The "Lag" and "Lead" operations Calculate the moving average forward or backwards, respectively. Allow accessing a value stored in a row above/below the current row. They allow calculating arithmetic operations (addition and subtraction) with dates and times. An API will return (usually): Data coded in a HTML format Data coded in a JSON format A CSV file. Imputation of missing data to the most frequent value: Can increase data variance and this is an advantage of the method. Has no effect on the data variance Can reduce data variance and this is a drawback of the method
. Outliers are usually classified into: Individual and Collective Global, Contextual and Collective. Global, Local and Contextual . The Accuracy dimension of Data Quality: is focused on the degree to which the data represents the reality is focused on the degree to which necessary data is available for use is focused on the degree to which the data is available and up to date at the time it is needed. An association rule: Being uni or bi directional depends on the data used Is bidirectional: X<->Y Is unidirectional: X->Y or Y->X. Dummy variables: Can generate columns with any integer number Will generate columns with numbers in the range [0, 1] Will generate columns with 0s or 1s. The most relevant "V" of Big Data for us is: Variety Volume Value. Which of the following is not a phase of the CRISP-DM Methodology? Business Understanding Data Preparation Data Formating. The data science skillset is composed by (mainly): Statistics and Programming skills. Machine Learning and Data visualization. Programing skills, Domain Knowledge, Math and Statistics. The ANOVA test: Measures the relation between 2 quantitative attributes. Measures the relation between 2 qualitative attributes. Measures the relation between 1 qualitative and 1 quantitative attributes. Parametric methods: Will always return the same result when feed with similar data. Can control de complexity of the resulting structure independently of the data used. Will not always return the same result when feed with similar data. The regular expression "\d{10}0": Triggers on strings of at least 11 digits ending with a 0. Triggers on strings of 11 digits ending with a 0. It is triggered by 4-digit strings ending with 100. Statistically, a CITY attribute in a table should be considered as: An ordinal atribute. A Nominal atribute. A Continuos atribute. If we discretize into 2 bins a column with 10 numerical values (not repeated) using equal-depth procedure: The distribution of the values in the result cannot be known beforehand. The result will be each of the bins having exactly 5 values Equal-depth is not a discretization procedure. The correlation coefficient: Is always in the range [-1,+1] Can be any real value Is always greater than 0. MAE, MSE and MAPE stand for: Measures used within clustering algorithms. Measures used within regression models. Measures used within classification models.
|