In today’s rapidly evolving business landscape, data analysis has emerged as a cornerstone of decision-making across industries. As companies strive to harness the power of data to gain competitive advantages, the demand for skilled data analysts continues to soar. For those aspiring to join the ranks of leading organizations like Tech Mahindra, mastering data analysis is not just a goal—it’s a necessity. In this blog, we’ll delve into essential data analysis interview questions and insightful answers tailored specifically for Tech Mahindra candidates. Whether you’re a seasoned professional or a budding analyst, this guide will equip you with the knowledge and confidence needed to ace your interview and showcase your expertise in the world of data analysis.
Table of Contents
Question on SQL (Structure Query Language)
Question: What is SQL?
Answer: SQL stands for Structured Query Language. It is a standardized programming language used to manage and manipulate relational databases. SQL is used for tasks such as querying data, updating data, inserting data, and deleting data from databases.
Question: Differentiate between SQL and MySQL.
Answer: SQL is a standardized query language used to manage relational databases, whereas MySQL is an open-source relational database management system (RDBMS) that uses SQL as its language.
Question: What are the different types of SQL commands?
Answer: SQL commands can be categorized into four types:
- Data Definition Language (DDL): Used to define, alter, or drop database objects like tables, indexes, etc.
- Data Manipulation Language (DML): Used to retrieve, insert, update, and delete data from a database.
- Data Control Language (DCL): Used to control access to data within the database, e.g., granting or revoking permissions.
- Transaction Control Language (TCL): Used to manage transactions within a database, e.g., committing or rolling back transactions.
Question: Explain the difference between CHAR and VARCHAR data types.
Answer:
CHAR is a fixed-length character data type that stores a fixed-length string. It always allocates the same amount of storage, regardless of the actual length of the string.
VARCHAR is a variable-length character data type that stores a variable-length string. It only allocates storage for the actual length of the string plus two bytes.
Question: What is a primary key?
Answer: A primary key is a column or a set of columns that uniquely identifies each row in a table. It must contain unique values and cannot contain NULL values. By defining a primary key, you enforce entity integrity within the table.
Question: What is a foreign key?
Answer: A foreign key is a column or a set of columns in a table that establishes a link between data in two tables. It creates a relationship between two tables by referencing the primary key of another table. Foreign keys help enforce referential integrity within a database.
Question: What is a join in SQL?
Answer: A join is an SQL operation that combines rows from two or more tables based on a related column between them. There are different types of joins such as INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN.
Question: Explain the difference between INNER JOIN and OUTER JOIN.
Answer:
- INNER JOIN: Returns rows when there is at least one match in both tables.
- OUTER JOIN:
- LEFT JOIN Returns all rows from the left table and matching rows from the right table.
- RIGHT JOIN: Returns all rows from the right table and matching rows from the left table.
- FULL JOIN: Returns all rows when there is a match in one of the tables.
Question: What is a subquery?
Answer: A subquery is a query nested inside another query. It can be used to return data that will be used by the main query as a condition or criteria. Subqueries can be used in SELECT, INSERT, UPDATE, or DELETE statements.
Question: Explain the difference between the HAVING clause and the WHERE clause.
Answer:
WHERE clause: Used to filter records before the aggregation in a SELECT statement.
HAVING clause: Used to filter records after the aggregation in a SELECT statement. It is applied to groups defined by the GROUP BY clause.
Basic DSA Questions
Question: What is a data structure?
Answer: A data structure is a way of organizing and storing data in a computer so that it can be accessed and modified efficiently. It defines the relationship between the data elements and the operations that can be performed on them.
Question: What is an array?
Answer: An array is a data structure that stores a collection of elements of the same data type in contiguous memory locations. Elements in an array can be accessed using their index, and arrays offer constant-time access to elements.
Question: What is a linked list?
Answer: A linked list is a linear data structure consisting of a sequence of elements called nodes. Each node contains data and a reference (or pointer) to the next node in the sequence. Unlike arrays, linked lists do not have a fixed size and allow dynamic memory allocation.
Question: What are the differences between an array and a linked list?
Answer:
Array:
Contiguous memory allocation.
Fixed size.
Random access to elements.
Linked List:
Non-contiguous memory allocation.
Dynamic size.
Sequential access to elements.
Question: What is a stack?
Answer: A stack is a linear data structure that follows the Last In, First Out (LIFO) principle. It has two primary operations: push (adds an element to the top of the stack) and pop (removes the top element from the stack).
Question: What is a queue?
Answer: A queue is a linear data structure that follows the First In, First Out (FIFO) principle. It has two primary operations: enqueue (adds an element to the rear of the queue) and dequeue (removes the front element from the queue).
Question: What is a binary tree?
Answer: A binary tree is a hierarchical data structure in which each node has at most two children, referred to as the left child and the right child. Binary trees are commonly used for searching and sorting operations.
Question: What is the difference between a binary tree and a binary search tree (BST)?
Answer:
- Binary Tree: A hierarchical data structure where each node has at most two children.
- Binary Search Tree (BST): A binary tree in which the left child of a node contains a value less than the node’s value, and the right child contains a value greater than the node’s value.
Questions on Power BI
Question: What is Power BI?
Answer: Power BI is a business analytics tool by Microsoft used to visualize and analyze data from various sources. It allows users to create interactive reports, dashboards, and data visualizations to gain insights and make data-driven decisions.
Question: What are the components of Power BI?
Answer: The main components of Power BI include:
- Power BI Desktop: A desktop application used for creating reports and data models.
- Power BI Service: A cloud-based service for publishing, sharing, and collaborating on Power BI reports and dashboards.
- Power BI Mobile: Mobile apps for accessing and viewing Power BI content on smartphones and tablets.
Question: Explain the difference between Power BI Desktop and Power BI Service.
Answer:
Power BI Desktop: It is a desktop application used for creating reports and data models. Users can connect to various data sources, transform data, create visualizations, and publish reports to the Power BI Service.
Power BI Service: It is a cloud-based service where users can publish, share, and collaborate on Power BI reports and dashboards. It allows for centralized management of reports, data refresh scheduling, and sharing with stakeholders.
Question: How do you connect to data sources in Power BI?
Answer: Power BI supports connecting to various data sources, including databases (SQL Server, MySQL, etc.), Excel files, CSV files, SharePoint lists, online services (Google Analytics, Salesforce, etc.), and more. Users can connect to data sources using built-in connectors or by writing custom queries.
Question: What is a Power BI Gateway?
Answer: Power BI Gateway is a bridge that connects on-premises data sources to the Power BI Service in the cloud. It allows Power BI reports and dashboards to access data stored in on-premises databases securely. There are two types of gateways: On-premises data gateway and On-premises data gateway (personal mode).
Question: How do you create visualizations in Power BI?
Answer: In Power BI Desktop, users can create visualizations by dragging and dropping fields from the data model onto the canvas. They can choose from various visualization types such as bar charts, line charts, pie charts, maps, tables, and more. Users can customize visualizations by adjusting properties and formatting options.
Question: What is DAX (Data Analysis Expressions) in Power BI?
Answer: DAX is a formula language used in Power BI for creating calculated columns, measures, and calculated tables. It allows users to perform calculations, manipulate data, and create custom metrics based on data in the data model. DAX functions include aggregation functions, logical functions, statistical functions, and more.
Question: How do you share Power BI reports and dashboards with others?
Answer: Power BI reports and dashboards can be shared with others through the Power BI Service. Users can publish reports to the Power BI Service, create workspaces to collaborate with colleagues and share reports with specific users or groups. Additionally, reports can be embedded into websites or shared as links.
Question: What is Power Query in Power BI?
Answer: Power Query is a data connectivity and data preparation tool in Power BI Desktop. It allows users to connect to and import data from various sources, perform data transformations (e.g., filtering, merging, pivoting), and create a clean and structured data model for analysis.
Question: How do you schedule data refresh in Power BI Service?
Answer: In Power BI Service, users can schedule data refresh for datasets that are published from Power BI Desktop. They can configure refresh schedules based on specific intervals (e.g., daily, weekly) and set up credentials for data source authentication. Data refresh ensures that reports and dashboards reflect the most up-to-date data.
Question on Excel
Question: What is Excel and how is it used in business?
Answer: Excel is a spreadsheet software by Microsoft used for data analysis, calculation, visualization, and reporting. In business, Excel is widely used for financial modeling, budgeting, forecasting, data analysis, and creating reports.
Question: Explain the VLOOKUP function in Excel.
Answer: VLOOKUP is a function in Excel used to search for a value in the first column of a table array and return a value in the same row from a specified column. Its syntax is VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]).
Question: How do you create a pivot table in Excel?
Answer: To create a pivot table in Excel, select the data range, go to the Insert tab, click on PivotTable, choose the data range, select where to place the pivot table (e.g., new worksheet), and then drag and drop fields into the Rows, Columns, and Values areas to summarize the data.
Question: What is conditional formatting in Excel?
Answer: Conditional formatting is a feature in Excel that allows users to apply formatting (e.g., colors, icons, data bars) to cells based on specific conditions or criteria. It helps highlight trends, patterns, and outliers in data visually.
Question: How do you protect cells in Excel?
Answer: To protect cells in Excel, select the cells to be protected, go to the Review tab, click on Protect Sheet, specify a password (optional), and choose the options for what users can and cannot do with the protected cells.
Question: What is the difference between CONCATENATE and TEXTJOIN functions in Excel?
Answer:
- CONCATENATE: Combines multiple strings or values into one text string. It requires specifying each value separately.
- TEXTJOIN: Combines multiple strings or values into one text string with a specified delimiter. It can handle ranges of cells and ignore empty cells.
Question: Explain the difference between COUNT, COUNTA, and COUNTBLANK functions in Excel.
Answer:
- COUNT: Counts the number of cells in a range that contain numbers.
- COUNTA: Counts the number of cells in a range that are not empty (i.e., contain any value).
- COUNTBLANK: Counts the number of empty cells in a range.
Question: How do you create a chart in Excel?
Answer: To create a chart in Excel, select the data range, go to the Insert tab, choose the desired chart type (e.g., column chart, line chart), and customize the chart elements (e.g., axis labels, legend) as needed.
Questions on Pl SQL
Question: What is PL/SQL?
Answer: PL/SQL (Procedural Language/Structured Query Language) is Oracle Corporation’s procedural language extension for SQL. It allows users to write procedural code such as loops, conditional statements, and exception-handling blocks within SQL commands.
Question: What are the advantages of using PL/SQL?
Answer:
Tight integration with SQL.
Improved performance due to reduced network traffic.
Better security and encapsulation of code.
Enhanced productivity with reusable code blocks.
Support for procedural constructs like loops and conditional statements.
Question: Differentiate between SQL and PL/SQL.
Answer:
SQL (Structured Query Language) is a query language used to interact with databases.
PL/SQL (Procedural Language/Structured Query Language) is a procedural extension of SQL used to write procedural code such as functions, procedures, and triggers within SQL commands.
Question: What is a cursor in PL/SQL?
Answer: A cursor is a database object used to retrieve data row by row from a result set in a SQL query. Cursors are typically used in PL/SQL to process individual rows returned by a query, allowing for row-level operations and data manipulation.
Question: Explain the difference between a function and a procedure in PL/SQL.
Answer:
- Function: A function is a PL/SQL block that returns a single value. It must return a value using the RETURN statement.
- Procedure: A procedure is a PL/SQL block that performs a specific task but does not return any value. It may have input and output parameters.
Question: What is an exception in PL/SQL?
Answer: An exception in PL/SQL is an error condition that disrupts the normal flow of a program. Exceptions can be raised explicitly using the RAISE statement or automatically by the database when errors occur (e.g., division by zero, constraint violations).
Question: How do you handle exceptions in PL/SQL?
Answer: Exceptions in PL/SQL are handled using exception handling blocks, which consist of the EXCEPTION keyword followed by one or more exception handlers. Handlers specify actions to be taken when specific exceptions occur, such as logging the error, rolling back transactions, or displaying error messages.
Question: What is a trigger in PL/SQL?
Answer: A trigger in PL/SQL is a stored program unit that automatically executes in response to certain events (e.g., INSERT, UPDATE, DELETE) occurring in a database table. Triggers are used to enforce data integrity, implement business rules, and automate tasks.
Question: Explain the difference between a BEFORE trigger and an AFTER trigger in PL/SQL.
Answer:
- BEFORE trigger: Executes before the triggering event (e.g., BEFORE INSERT, BEFORE UPDATE). It can be used to modify data values before they are written to the database.
- AFTER trigger: Executes after the triggering event (e.g., AFTER INSERT, AFTER UPDATE). It can be used to perform actions after data changes have been applied to the database.
Other Questions
Question: What is normalization?
Answer: Normalization is a database design process aimed at reducing redundancy and improving data integrity. It involves organizing tables to ensure each contains atomic values, adhering to rules called normal forms. By eliminating duplication and minimizing data anomalies, such as insertion, deletion, and update anomalies, normalization helps create a more efficient, flexible, and maintainable database structure.
Question: What are the use cases of machine learning?
Answer: Machine learning has a wide range of use cases across various industries and domains. Some common applications include:
- Predictive Analytics: Forecasting future outcomes based on historical data.
- Recommendation Systems: Offering personalized suggestions to users.
- Natural Language Processing: Analyzing and processing human language for tasks like sentiment analysis and chatbots.
- Computer Vision: Identifying objects and patterns in images and videos.
- Fraud Detection: Recognizing fraudulent activities by analyzing data patterns.
- Healthcare: Assisting in medical diagnosis, patient monitoring, and treatment recommendations.
- Autonomous Vehicles: Enabling self-driving cars and drones to navigate autonomously.
Question: Why are indexes used in databases
Answer: Indexes are used in databases to improve the performance of data retrieval operations, particularly for queries involving large datasets. Here’s why indexes are essential:
- Fast Retrieval: They enable quick lookup of specific rows, enhancing query performance.
- Efficient Sorting: Indexes facilitate rapid sorting of data based on indexed columns, optimizing ORDER BY operations.
- Streamlined Joins: They expedite join operations by accelerating row matching between related tables.
- Data Integrity: Indexes enforce uniqueness constraints on columns, ensuring data integrity.
- Query Optimization: Database query optimizers utilize indexes to generate efficient execution plans, enhancing overall system performance.
Question: What is unsupervised learning?
Answer: Unsupervised learning involves training models on unlabeled data to uncover hidden patterns or structures autonomously, without specific target variables. It aims to identify intrinsic data relationships and is commonly used for clustering and dimensionality reduction tasks. Unlike supervised learning, there’s no predefined output, and the algorithm must infer insights from the data independently.
Conclusion
Excelling in data analysis is pivotal for driving business success in today’s competitive environment. By leveraging the insights gleaned from thorough data examination, companies like Tech Mahindra can make informed decisions, optimize processes, and gain a competitive edge in their respective industries. As you prepare for your interview, mastering these fundamental concepts and articulating them confidently will position you as a valuable asset to any data-driven organization. With diligent preparation and a solid understanding of data analysis principles, you’re well-equipped to ace your interview and embark on a rewarding career journey in the dynamic field of data analysis.