Novartis Top Data Analytics Interview Questions and Answers

March 15, 2024

116

Are you preparing for a data science or analytics interview at Novartis? Congratulations on landing an interview opportunity with one of the leading pharmaceutical companies in the world! To help you ace your interview, we have compiled a list of common interview questions and provided detailed answers tailored for Novartis.

Table of Contents

Technical Interview Questions

Question: What is the technique to evaluate Logistic regression?

Answer: The technique to evaluate Logistic Regression involves:

Confusion Matrix: Provides TP, TN, FP, FN values.
Accuracy: Measures overall correctness: (TP + TN) / (TP + TN + FP + FN).
Precision: Proportion of correct positive predictions: TP / (TP + FP).
Recall (Sensitivity): Proportion of actual positives correctly predicted: TP / (TP + FN).
F1 Score: Harmonic mean of precision and recall: 2 * (Precision * Recall) / (Precision + Recall).

Question: What is your experience in combining several different data types into a machine learning model?

Answer: I have extensive experience in combining various data types into machine learning models:

Feature Engineering: Transforming diverse data types like numerical, categorical, and text into usable features.
Encoding: Utilizing techniques such as one-hot encoding for categorical variables.
Normalization: Scaling numerical features to ensure balanced contributions.
Text Processing: Converting text data into numerical representations using methods like TF-IDF or word embeddings.
Handling Missing Values: Imputation or treating missing values based on data type and distribution.

Question: How do you regularise logistic regression?

Answer: Regularizing Logistic Regression helps prevent overfitting and improve model generalization:

L1 Regularization (Lasso): Adds absolute value of coefficients to the cost function, encouraging sparsity.
L2 Regularization (Ridge): Adds squared coefficients to the cost function, penalizing large coefficients.
Elastic Net Regularization: Combines L1 and L2 penalties, providing a balance between sparsity and robustness.
Hyperparameter Tuning: Adjusting regularization strength (alpha) to find optimal model performance.

Question: What are hyperparameters for random forest?

Answer: Hyperparameters for Random Forest include:

Number of Trees (n_estimators): Number of decision trees in the forest.
Maximum Depth (max_depth): Maximum depth of each tree to control overfitting.
Minimum Samples Split (min_samples_split): Minimum number of samples required to split an internal node.
Minimum Samples Leaf (min_samples_leaf): Minimum number of samples required to be a leaf node.
Maximum Features (max_features): Number of features to consider for the best split.
Bootstrap Sampling (bootstrap): Whether to use bootstrap samples when building trees.

Question: What is the key difference between pharmaceutical logistics and FMCG logistics?

Answer:

Product Characteristics:

Pharmaceuticals: Pharmaceuticals are often high-value, sensitive to temperature and environmental conditions, and have strict regulatory requirements for storage and transportation.
FMCG: FMCG products are typically lower in value per unit, non-perishable have longer shelf lives, and may not have stringent temperature or storage requirements.

Supply Chain Complexity:

Pharmaceuticals: The pharmaceutical supply chain is often complex due to the need for specialized handling, multiple stakeholders (manufacturers, distributors, pharmacies, hospitals), and global distribution networks.
FMCG: FMCG logistics focus on large-scale production, distribution to retailers, and managing high volumes of products with a simpler supply chain compared to pharmaceuticals.

Risk Management:

Pharmaceuticals: Due to the high value and sensitivity of pharmaceutical products, logistics operations need robust risk management strategies for product integrity, security, and compliance with quality standards.
FMCG: While risk management is also important in FMCG logistics, the focus is more on efficient inventory turnover, minimizing stockouts, and optimizing distribution networks for timely deliveries to retailers.

Question: What are the types of Machine Learning Algorithms?

Answer:

Supervised Learning:

Classification: Predicts categories like spam vs. non-spam emails or species of flowers from features.

Regression: Predicts continuous values like house prices or stock prices based on input features.

Unsupervised Learning:

Clustering: Groups data points based on similarities without predefined labels, like customer segmentation.

Dimensionality Reduction: Reduces the number of input features while retaining essential information, often used in visualizations.

Semi-Supervised Learning:

Uses a mix of labeled and unlabeled data for training, suitable when obtaining labeled data is costly or time-consuming.

Reinforcement Learning:

Learns through trial and error by interacting with an environment to achieve a specific goal, often seen in game-playing agents or robotics.

Deep Learning:

Utilizes neural networks with multiple layers to learn complex patterns, beneficial for tasks like image recognition or natural language understanding.

Question: Difference between Epi-based forecasting and non-Epi-based forecasting?

Answer:

Data Source:

Epi-based Forecasting: Relies on epidemiological data like disease incidence and transmission rates.
Non-Epi-based Forecasting: Uses non-epidemiological data such as sales history or economic indicators.

Methodology:

Epi-based Forecasting: Utilizes disease dynamics models like SIR or SEIR models.
Non-Epi-based Forecasting: Employs statistical methods or machine learning algorithms for predictions.

Purpose:

Epi-based Forecasting: Predicts disease outbreaks and guides public health responses.
Non-Epi-based Forecasting: Used for sales, financial, or demand forecasting.

Focus:

Epi-based Forecasting: Centers on infectious disease modeling and prevention strategies.
Non-Epi-based Forecasting: Focuses on business-related predictions such as sales trends or market demand.

Question: What are the different types of databases?

Answer:

Relational Databases:

Organize data into tables with rows and columns, using SQL for querying.
Examples: MySQL, PostgreSQL, Oracle Database.

NoSQL Databases:

Designed for flexible, unstructured, or semi-structured data.
Types include document-oriented (MongoDB), key-value (Redis), and column-oriented (Cassandra) databases.

Graph Databases:

Ideal for storing and querying relationships between entities using graph structures.
Examples: Neo4j, Amazon Neptune, JanusGraph.

Object-Oriented Databases:

Organize data as objects, allowing for complex data types and relationships.
Examples: db4o, ObjectDB.

Time-Series Databases:

Optimized for storing and querying time-stamped data like sensor readings or financial data.
Examples: InfluxDB, TimescaleDB, Prometheus.

SQL Interview Questions

Question: What is SQL and its main functions?

Answer: SQL (Structured Query Language) is a programming language designed for managing and querying relational databases. Its main functions include:

Retrieving data with SELECT statements.

Modifying data with INSERT, UPDATE, and DELETE statements.

Creating and modifying database schemas with CREATE, ALTER, and DROP statements.

Question: Explain the difference between SQL JOINs (INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN).

Answer:

INNER JOIN: Returns rows when there is at least one match in both tables.
LEFT JOIN: Returns all rows from the left table and matching rows from the right table.
RIGHT JOIN: Returns all rows from the right table and matching rows from the left table.
FULL JOIN: Returns rows when there is a match in either table.

Question: What are the primary key and foreign key in SQL?

Answer:

Primary Key: A unique identifier for each record in a table, ensuring each row has a distinct value.
Foreign Key: A field in one table that refers to the primary key in another table, establishing a link between the two tables.

Question: Explain the difference between WHERE and HAVING clauses in SQL.

Answer:

WHERE: Used to filter rows before they are grouped or aggregated.
HAVING: Used to filter groups after they have been formed by GROUP BY.

Question: What is a subquery in SQL?

Answer: A subquery is a query nested within another query.

It can be used to retrieve data needed for the main query, filter results, or perform calculations.

Question: What is the difference between UNION and UNION ALL in SQL?

Answer:

UNION: Combines the result sets of two or more SELECT statements into a single result set, removing duplicates.
UNION ALL: Combines the result sets of two or more SELECT statements into a single result set, including all rows (with duplicates).

Question: Explain the use of the GROUP BY clause in SQL.

Answer: The GROUP BY clause is used to group rows that have the same values into summary rows.

It is often used with aggregate functions like SUM, COUNT, and AVG to perform calculations on grouped data.

Question: What is the difference between a stored procedure and a function in SQL?

Answer:

Stored Procedure: A precompiled set of SQL statements stored in the database and can be executed with a single call.
Function: Returns a single value and can be used in SQL statements.

Question: How do you handle NULL values in SQL queries?

Answer: Use the IS NULL or IS NOT NULL operators to check for NULL values.

Use the COALESCE function to replace NULL values with a specified default value.

Question: Explain the concept of ACID properties in the context of database transactions.

Answer: ACID stands for Atomicity, Consistency, Isolation, and Durability.

It ensures that database transactions are processed reliably and consistently.

Technical Interview Topics

Machine learning
Excel Based Question.
SQL Based Question
Advance SAS questions

General Interview Questions

Que: When did a customer change your approach?

Que: Do you have relevant experience?

Que: Questions about my resume and projects.

Que: What are your strength and weakness?

Conclusion

Preparation is key to succeeding in a data science or analytics interview at Novartis. By familiarizing yourself with these interview questions and crafting thoughtful answers, you can showcase your skills, knowledge, and problem-solving abilities to the hiring team.

Remember to also demonstrate your understanding of the pharmaceutical industry, its challenges, and the importance of data-driven decision-making. Good luck with your interview at Novartis, and we hope this guide helps you in your preparation journey!

Technical Interview Questions

Question: What is the technique to evaluate Logistic regression?

Question: What is your experience in combining several different data types into a machine learning model?

Question: How do you regularise logistic regression?

Question: What are hyperparameters for random forest?

Question: What is the key difference between pharmaceutical logistics and FMCG logistics?

Question: What are the types of Machine Learning Algorithms?

Question: Difference between Epi-based forecasting and non-Epi-based forecasting?

Question: What are the different types of databases?

SQL Interview Questions

Question: What is SQL and its main functions?

Question: Explain the difference between SQL JOINs (INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN).

Question: What are the primary key and foreign key in SQL?

Question: Explain the difference between WHERE and HAVING clauses in SQL.

Question: What is a subquery in SQL?

Question: What is the difference between UNION and UNION ALL in SQL?

Question: Explain the use of the GROUP BY clause in SQL.

Question: What is the difference between a stored procedure and a function in SQL?

Question: How do you handle NULL values in SQL queries?

Question: Explain the concept of ACID properties in the context of database transactions.

Technical Interview Topics

General Interview Questions

Conclusion

LEAVE A REPLY Cancel reply