Are you ready to dive into the world of data analytics and unlock exciting career opportunities? Cognizant, a global leader in IT services and consulting, offers a plethora of opportunities for data enthusiasts. As you prepare for your interview, let’s explore some common data analytics interview questions you might encounter, along with expert answers to help you shine bright.
Table of Contents
Interview Questions
Question: What are the constraints in SQL?
Answer: In SQL, constraints are rules that enforce certain conditions on the data stored in tables. Here are common constraints:
- NOT NULL: Ensures a column cannot have NULL values.
- UNIQUE: Ensures all values in a column are unique.
- PRIMARY KEY: Combines NOT NULL and UNIQUE, uniquely identifies each row.
- FOREIGN KEY: Establishes relationships between tables.
- CHECK: Ensures values in a column satisfy a specific condition.
- DEFAULT: Provides a default value for a column if not specified.
- INDEX: Improves data retrieval speed by creating an index on columns.
Question: Difference between Precision vs Recall?
Answer:
- Precision is about the relevancy of the predicted positive instances, emphasizing the proportion of correct positive predictions among all predicted positives.
- Recall is about the completeness of the predicted positive instances, emphasizing the proportion of correct positive predictions among all actual positives.
- A model with high precision may have many false negatives (missed positive cases), while a model with high recall may have many false positives (incorrectly labeled positive cases).
- The trade-off between precision and recall needs to be balanced based on the specific goals and requirements of the classification task.
Question: What is F1 Score?
Answer: The F1 Score is a metric that combines both Precision and Recall into a single value, providing a balance between the two measures. It is especially useful in binary classification tasks where the classes are imbalanced.
Question: What is data structure?
Answer: A data structure in computer science organizes and stores data efficiently for easy access and manipulation. It defines relationships among data elements and operations like insertion, deletion, and traversal. Types include linear structures like arrays and linked lists, as well as non-linear structures like trees and graphs. The choice of data structure impacts algorithm efficiency, used in various computing aspects from databases to artificial intelligence.
Question: What is constructor and how its different from method?
Answer: A constructor (__init__()) in Python is a special method that initializes an object’s state when it is created.
It is automatically called upon object creation, setting initial values for attributes.
The constructor takes parameters to initialize object attributes.
A method in a class is a function associated with that class, performing actions or computations based on object attributes.
Unlike the constructor, methods are called explicitly using the object instance.
The constructor initializes, while methods perform actions or computations on object attributes.
Question: What are Analytical Functions in SQL?
Answer: Analytical functions in SQL, also known as window functions, perform calculations on a group of rows.
Partitioning divides rows into smaller groups using PARTITION BY.
Ordering within partitions is done using ORDER BY.
Common functions include ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LAG(), and LEAD().
They simplify complex calculations, provide insights into data trends, and improve query performance.
Question: Difference between WHERE AND HAVING clause.
Answer:
- Usage: WHERE is used for filtering rows before grouping, while HAVING is used for filtering results after grouping.
- Aggregation: WHERE cannot be used with aggregate functions, while HAVING can filter based on the results of aggregate functions.
- Timing: WHERE is applied before data is grouped or aggregated, whereas HAVING is applied after grouping and aggregation.
Question: What is a JOIN and describe different types of JOINs?
Answer: A JOIN in SQL is used to combine rows from two or more tables based on a related column between them. It allows you to retrieve data from multiple tables in a single query, creating a virtual table by matching rows in the tables.
Different types of joins are:
- INNER JOIN is commonly used to retrieve rows with matching values in both tables.
- LEFT JOIN retrieves all rows from the left table and the matched rows from the right table.
- RIGHT JOIN is the opposite of LEFT JOIN, returning all rows from the right table.
- FULL JOIN returns all rows when there is a match in either table.
- CROSS JOIN is used to create a Cartesian product of the two tables.
Question: Difference between stored procedure and functions.
Answer:
Return Value:
- Stored Procedures can return zero or more values. They can have output parameters.
- Functions must return a single value. They cannot have output parameters.
Usage:
- Stored Procedures are used for tasks like data manipulation, transactions, etc.
- Functions are used for calculations, lookups, or simplifying complex queries.
Control-of-Flow:
- Stored Procedures can contain control-of-flow statements like IF, WHILE, etc.
- Functions cannot contain such control statements.
Calling Syntax:
- Stored Procedures are called using EXECUTE or EXEC command.
- Functions are called as part of a SQL expression.
Transactions:
- Stored Procedures can participate in transactions and can commit or rollback changes.
- Functions cannot change the database state or participate in transactions.
Question: What are the python libraries?
Answer: Python libraries are pre-written sets of code that offer functionalities for specific tasks without needing to code from scratch. Here are some commonly used libraries:
- Pandas: Data manipulation and analysis.
- NumPy: Mathematical operations on arrays and matrices.
- Matplotlib: Creating static, interactive, and animated visualizations.
- Scikit-learn: Simple and efficient tools for machine learning.
- TensorFlow and Keras: Deep learning frameworks.
- Django and Flask: Web development frameworks.
- NLTK and Spacy: Natural language processing tools.
- SciPy and SymPy: Scientific and technical computing.
Question: What are the data types in c++ ?
Answer:
- Integer Types: int, short, long, long long (C++11), for whole numbers.
- Floating-Point Types: float, double, long double, for decimal numbers.
- Character Types: char, char16_t (C++11), char32_t (C++11), wchar_t, for characters.
- Boolean Type: bool, for true/false values.
- Derived Types: Arrays, Pointers, References for more complex data structures.
- User-Defined Types: Structures, Enumerations, Classes, and Unions for custom data organization.
Question: What are Bias and Variance?
Answer:
Bias:
- Bias refers to the error introduced by approximating a real-world problem, which may be complex, with a simplified model.
- A high-bias model is overly simplified and does not capture the underlying patterns in the data.
- It leads to underfitting, where the model performs poorly on both the training and unseen data.
- Examples of high bias models include linear models for non-linear data and simple decision trees for complex data.
Variance:
- Variance refers to the amount by which the model’s prediction would change if trained on different data.
- A high variance model is overly sensitive to the training data and captures noise along with the underlying patterns.
- It leads to overfitting, where the model performs very well on the training data but poorly on unseen data.
- Examples of high variance models include deep neural networks with many layers for small datasets and decision trees with no constraints.
Question: What tools do you use for data analysis model?
Answer:
- Pandas: Python library for data manipulation, cleaning, and analysis.
- Matplotlib, Seaborn, Plotly: Python libraries for creating visualizations.
- SciPy, StatsModels: Python libraries for statistical analysis.
- Scikit-learn, TensorFlow, PyTorch: Tools for machine learning tasks.
- Apache Spark, Hadoop, Dask: Frameworks for big data processing.
- Tableau, Power BI: Business Intelligence tools for creating interactive dashboards.
- SQL, SQLite, MySQL, PostgreSQL: Tools for database querying and management.
Question: What is a normal distribution?
Answer: A normal distribution, also called Gaussian distribution, is a symmetric, bell-shaped probability distribution defined by mean (μ) and standard deviation (σ). Its properties include:
Symmetry around the mean, with mean, median, and mode being equal.
About 68%, 95%, and 99.7% of data fall within one, two, and three standard deviations from the mean, respectively.
Widely observed in natural phenomena, it’s crucial for statistical testing, predictive modeling, and data analysis, providing insights into data distribution and probabilities.
Question: Explain what Javascript is.
Answer: JavaScript is a high-level, dynamic, and interpreted programming language primarily used for creating interactive and dynamic content on websites. Originally developed by Brendan Eich at Netscape in 1995, JavaScript has evolved into one of the most popular languages for front-end web development.
General Questions
Question: What are your strengths?
Question: How long stick with our company?
Question: What are your Salary expectations?
Question: How did you manage your Time management skills?
Question: Describe why you want to join the company and the specific job.
Other Technical Questions
Question: Count the number of word which are repeated in word
Question: Scenario based CDC implementation question.
Question: What do you mean by Adverse drug reaction?
Question: What to do with outliers in data?
Question: Why do you intend to join CTS?
Question: SQL code to REMOVE duplicate records in a table
Question: How to use Concatenate function in Excel.
Question: How to find duplicate rows in SQL.
Question: What are modules in python?
Other Technical Topics
Excel, power bi, SQL, google sheets, java.
Encapsulation, Oops.
Questions on SQL joins, windows, and sub-query.
Simple statistics questions.
Basic questions on v lookup, and pivot tables.
Conclusion
Data analytics holds the key to unlocking valuable insights and driving informed decision-making in businesses today. The interview questions at Cognizant are designed to assess your proficiency in data analytics concepts, tools, and problem-solving abilities. By understanding these questions and crafting thoughtful responses, you can confidently navigate your way to success in the interview room. So, gear up, prepare diligently, and embark on your journey to a rewarding career in data analytics with Cognizant. Good luck!