Tough SQL Problems: Comprehensive Guide with Explanations for Interviews
SQL (Structured Query Language) is a critical skill for data professionals, including data analysts, data scientists, and database administrators. In interviews, SQL questions can range from basic queries to complex problems that test your understanding of database concepts and your ability to write efficient queries. This comprehensive guide covers some tough SQL problems, their solutions, and detailed explanations to help you prepare for your next interview.
1. Finding the Nth Highest Salary
One of the classic SQL problems is finding the Nth highest salary from a table of employees.
Problem
Given a table Employees with columns id and salary, write a query to find the Nth highest salary.
Solution
SELECT DISTINCT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET N-1;
Explanation
This query uses the ORDER BY clause to sort the salaries in descending order. The DISTINCT keyword ensures that duplicate salaries are not considered. The LIMIT clause limits the number of results, and the OFFSET clause skips the first N-1 rows, effectively selecting the Nth highest salary.
2. Finding Duplicates in a Table
Another common problem is identifying duplicate records in a table.
Problem
Given a table Users with columns id and email, write a query to find duplicate email addresses.
Solution
SELECT email, COUNT(*)
FROM Users
GROUP BY email
HAVING COUNT(*) > 1;
Explanation
This query groups the records by the email column and counts the number of occurrences of each email. The HAVING clause filters the results to include only those groups with a count greater than one, indicating duplicate email addresses.
3. Finding Employees with Salaries Greater Than Their Managers
This problem involves self-joins and subqueries.
Problem
Given a table Employees with columns id, name, salary, and manager_id, write a query to find employees whose salary is greater than their manager's salary.
Solution
SELECT e1.name
FROM Employees e1
JOIN Employees e2 ON e1.manager_id = e2.id
WHERE e1.salary > e2.salary;
Explanation
This query uses a self-join to compare each employee's salary with their manager's salary. The JOIN clause joins the table Employees with itself based on the manager_id and id columns. The WHERE clause filters the results to include only those employees whose salary is greater than their manager's salary.
4. Finding the Second Highest Salary Without Using LIMIT
Finding the second highest salary can also be done using a subquery.
Problem
Given a table Employees with columns id and salary, write a query to find the second highest salary without using the LIMIT clause.
Solution
SELECT MAX(salary)
FROM Employees
WHERE salary < (SELECT MAX(salary) FROM Employees);
Explanation
This query uses a subquery to find the maximum salary, and then it finds the maximum salary that is less than the first maximum salary, effectively selecting the second highest salary.
5. Ranking Employees by Salary
Ranking employees by their salary is a common problem that can be solved using window functions.
Problem
Given a table Employees with columns id, name, and salary, write a query to rank employees by their salary.
Solution
SELECT id, name, salary,
RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM Employees;
Explanation
This query uses the RANK() window function to assign a rank to each employee based on their salary in descending order. The OVER clause specifies the ordering of the rows.
6. Finding the Department with the Highest Average Salary
This problem involves grouping data and calculating averages.
Problem
Given a table Employees with columns id, name, salary, and department_id, and a table Departments with columns id and name, write a query to find the department with the highest average salary.
Solution
SELECT d.name
FROM Departments d
JOIN Employees e ON d.id = e.department_id
GROUP BY d.name
ORDER BY AVG(e.salary) DESC
LIMIT 1;
Explanation
This query joins the Departments and Employees tables based on the department_id. It then groups the results by department name and calculates the average salary for each department. Finally, it orders the results by the average salary in descending order and limits the output to one row, effectively selecting the department with the highest average salary.
7. Finding Consecutive Days of Attendance
This problem involves using window functions to identify patterns in data.
Problem
Given a table Attendance with columns employee_id and date, write a query to find all employees who have attended for three consecutive days or more.
Solution
WITH RankedAttendance AS (
SELECT employee_id, date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY date) as row_num
FROM Attendance
)
SELECT employee_id, MIN(date) as start_date, MAX(date) as end_date, COUNT(*) as consecutive_days
FROM RankedAttendance
GROUP BY employee_id, DATEADD(DAY, -row_num, date)
HAVING COUNT(*) >= 3;
Explanation
This query first uses a CTE (Common Table Expression) to assign a row number to each attendance record for each employee, ordered by date. It then groups the results by the difference between the date and the row number, effectively identifying sequences of consecutive days. The HAVING clause filters the results to include only those sequences with three or more consecutive days.
8. Finding Top N Records for Each Group
This problem involves using window functions to rank records within groups.
Problem
Given a table Sales with columns salesperson_id,Sure! Here is a comprehensive article on tough SQL problems with detailed explanations, formatted in HTML for easy posting on your blog:
```html
Tough SQL Problems: Comprehensive Guide with Explanations for Interviews
SQL (Structured Query Language) is a critical skill for data professionals, including data analysts, data scientists, and database administrators. In interviews, SQL questions can range from basic queries to complex problems that test your understanding of database concepts and your ability to write efficient queries. This comprehensive guide covers some tough SQL problems, their solutions, and detailed explanations to help you prepare for your next interview.
1. Finding the Nth Highest Salary
One of the classic SQL problems is finding the Nth highest salary from a table of employees.
Problem
Given a table Employees with columns id and salary, write a query to find the Nth highest salary.
Solution
SELECT DISTINCT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET N-1;
Explanation
This query uses the ORDER BY clause to sort the salaries in descending order. The DISTINCT keyword ensures that duplicate salaries are not considered. The LIMIT clause limits the number of results, and the OFFSET clause skips the first N-1 rows, effectively selecting the Nth highest salary.
2. Finding Duplicates in a Table
Another common problem is identifying duplicate records in a table.
Problem
Given a table Users with columns id and email, write a query to find duplicate email addresses.
Solution
SELECT email, COUNT(*)
FROM Users
GROUP BY email
HAVING COUNT(*) > 1;
Explanation
This query groups the records by the email column and counts the number of occurrences of each email. The HAVING clause filters the results to include only those groups with a count greater than one, indicating duplicate email addresses.
3. Finding Employees with Salaries Greater Than Their Managers
This problem involves self-joins and subqueries.
Problem
Given a table Employees with columns id, name, salary, and manager_id, write a query to find employees whose salary is greater than their manager's salary.
Solution
SELECT e1.name
FROM Employees e1
JOIN Employees e2 ON e1.manager_id = e2.id
WHERE e1.salary > e2.salary;
Explanation
This query uses a self-join to compare each employee's salary with their manager's salary. The JOIN clause joins the table Employees with itself based on the manager_id and id columns. The WHERE clause filters the results to include only those employees whose salary is greater than their manager's salary.
4. Finding the Second Highest Salary Without Using LIMIT
Finding the second highest salary can also be done using a subquery.
Problem
Given a table Employees with columns id and salary, write a query to find the second highest salary without using the LIMIT clause.
Solution
SELECT MAX(salary)
FROM Employees
WHERE salary < (SELECT MAX(salary) FROM Employees);
Explanation
This query uses a subquery to find the maximum salary, and then it finds the maximum salary that is less than the first maximum salary, effectively selecting the second highest salary.
5. Ranking Employees by Salary
Ranking employees by their salary is a common problem that can be solved using window functions.
Problem
Given a table Employees with columns id, name, and salary, write a query to rank employees by their salary.
Solution
SELECT id, name, salary,
RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM Employees;
Explanation
This query uses the RANK() window function to assign a rank to each employee based on their salary in descending order. The OVER clause specifies the ordering of the rows.
6. Finding the Department with the Highest Average Salary
This problem involves grouping data and calculating averages.
Problem
Given a table Employees with columns id, name, salary, and department_id, and a table Departments with columns id and name, write a query to find the department with the highest average salary.
Solution
SELECT d.name
FROM Departments d
JOIN Employees e ON d.id = e.department_id
GROUP BY d.name
ORDER BY AVG(e.salary) DESC
LIMIT 1;
Explanation
This query joins the Departments and Employees tables based on the department_id. It then groups the results by department name and calculates the average salary for each department. Finally, it orders the results by the average salary in descending order and limits the output to one row, effectively selecting the department with the highest average salary.
7. Finding Consecutive Days of Attendance
This problem involves using window functions to identify patterns in data.
Problem
Given a table Attendance with columns employee_id and date, write a query to find all employees who have attended for three consecutive days or more.
Solution
WITH RankedAttendance AS (
SELECT employee_id, date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY date) as row_num
FROM Attendance
)
SELECT employee_id, MIN(date) as start_date, MAX(date) as end_date, COUNT(*) as consecutive_days
FROM RankedAttendance
GROUP BY employee_id, DATEADD(DAY, -row_num, date)
HAVING COUNT(*) >= 3;
Explanation
This query first uses a CTE (Common Table Expression) to assign a row number to each attendance record for each employee, ordered by date. It then groups the results by the difference between the date and the row number, effectively identifying sequences of consecutive days. The HAVING clause filters the results to include only those sequences with three or more consecutive days.
8. Finding Top N Records for Each Group
This problem involves using window functions to rank records within groups.
Problem
Given a table Sales with columns salesperson_id,Sure! Here is a comprehensive article on tough SQL problems with detailed explanations, formatted in HTML for easy posting on your blog:
```html
Tough SQL Problems: Comprehensive Guide with Explanations for Interviews
SQL (Structured Query Language) is a critical skill for data professionals, including data analysts, data scientists, and database administrators. In interviews, SQL questions can range from basic queries to complex problems that test your understanding of database concepts and your ability to write efficient queries. This comprehensive guide covers some tough SQL problems, their solutions, and detailed explanations to help you prepare for your next interview.
1. Finding the Nth Highest Salary
One of the classic SQL problems is finding the Nth highest salary from a table of employees.
Problem
Given a table Employees with columns id and salary, write a query to find the Nth highest salary.
Solution
SELECT DISTINCT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET N-1;
Explanation
This query uses the ORDER BY clause to sort the salaries in descending order. The DISTINCT keyword ensures that duplicate salaries are not considered. The LIMIT clause limits the number of results, and the OFFSET clause skips the first N-1 rows, effectively selecting the Nth highest salary.
2. Finding Duplicates in a Table
Another common problem is identifying duplicate records in a table.
Problem
Given a table Users with columns id and email, write a query to find duplicate email addresses.
Solution
SELECT email, COUNT(*)
FROM Users
GROUP BY email
HAVING COUNT(*) > 1;
Explanation
This query groups the records by the email column and counts the number of occurrences of each email. The HAVING clause filters the results to include only those groups with a count greater than one, indicating duplicate email addresses.
3. Finding Employees with Salaries Greater Than Their Managers
This problem involves self-joins and subqueries.
Problem
Given a table Employees with columns id, name, salary, and manager_id, write a query to find employees whose salary is greater than their manager's salary.
Solution
SELECT e1.name
FROM Employees e1
JOIN Employees e2 ON e1.manager_id = e2.id
WHERE e1.salary > e2.salary;
Explanation
This query uses a self-join to compare each employee's salary with their manager's salary. The JOIN clause joins the table Employees with itself based on the manager_id and id columns. The WHERE clause filters the results to include only those employees whose salary is greater than their manager's salary.
4. Finding the Second Highest Salary Without Using LIMIT
Finding the second highest salary can also be done using a subquery.
Problem
Given a table Employees with columns id and salary, write a query to find the second highest salary without using the LIMIT clause.
Solution
SELECT MAX(salary)
FROM Employees
WHERE salary < (SELECT MAX(salary) FROM Employees);
Explanation
This query uses a subquery to find the maximum salary, and then it finds the maximum salary that is less than the first maximum salary, effectively selecting the second highest salary.
5. Ranking Employees by Salary
Ranking employees by their salary is a common problem that can be solved using window functions.
Problem
Given a table Employees with columns id, name, and salary, write a query to rank employees by their salary.
Solution
SELECT id, name, salary,
RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM Employees;
Explanation
This query uses the RANK() window function to assign a rank to each employee based on their salary in descending order. The OVER clause specifies the ordering of the rows.
6. Finding the Department with the Highest Average Salary
This problem involves grouping data and calculating averages.
Problem
Given a table Employees with columns id, name, salary, and department_id, and a table Departments with columns id and name, write a query to find the department with the highest average salary.
Solution
SELECT d.name
FROM Departments d
JOIN Employees e ON d.id = e.department_id
GROUP BY d.name
ORDER BY AVG(e.salary) DESC
LIMIT 1;
Explanation
This query joins the Departments and Employees tables based on the department_id. It then groups the results by department name and calculates the average salary for each department. Finally, it orders the results by the average salary in descending order and limits the output to one row, effectively selecting the department with the highest average salary.
7. Finding Consecutive Days of Attendance
This problem involves using window functions to identify patterns in data.
Problem
Given a table Attendance with columns employee_id and date, write a query to find all employees who have attended for three consecutive days or more.
Solution
WITH RankedAttendance AS (
SELECT employee_id, date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY date) as row_num
FROM Attendance
)
SELECT employee_id, MIN(date) as start_date, MAX(date) as end_date, COUNT(*) as consecutive_days
FROM RankedAttendance
GROUP BY employee_id, DATEADD(DAY, -row_num, date)
HAVING COUNT(*) >= 3;
Explanation
This query first uses a CTE (Common Table Expression) to assign a row number to each attendance record for each employee, ordered by date. It then groups the results by the difference between the date and the row number, effectively identifying sequences of consecutive days. The HAVING clause filters the results to include only those sequences with three or more consecutive days.
8. Finding Top N Records for Each Group
This problem involves using window functions to rank records within groups.
Problem
Given a table Sales with columns salesperson_id,date, and amount, write a query to find the top 3 sales amounts for each salesperson.
Solution
WITH RankedSales AS (
SELECT salesperson_id, date, amount,
ROW_NUMBER() OVER (PARTITION BY salesperson_id ORDER BY amount DESC) as rank
FROM Sales)
SELECT salesperson_id, date, amount
FROM RankedSales
WHERE rank <= 3;
Explanation
This query first uses a CTE (Common Table Expression) to assign a rank to each sales record for each salesperson, ordered by the sales amount in descending order. It then filters the results to include only the top 3 sales amounts for each salesperson.
9. Finding Employees Who Never Received a Bonus
This problem involves using a subquery to filter results.
Problem
Given a table Employees with columns id and name, and a table Bonuses with columns employee_id and bonus, write a query to find all employees who never received a bonus.
Solution
SELECT e.name FROM Employees e
LEFT JOIN Bonuses b ON e.id = b.employee_id
WHERE b.employee_id IS NULL;
Explanation
This query uses a left join to include all employees and any matching records from the Bonuses table. The WHERE clause filters the results to include only those employees who do not have a matching record in the Bonuses table, indicating that they never received a bonus.
10. Finding Employees with the Same Salary
This problem involves identifying records with duplicate values.
Problem
Given a table Employees with columns id, name, and salary, write a query to find all employees who have the same salary as another employee.
Solution
SELECT e1.name, e1.salary FROM Employees e1
JOIN Employees e2 ON e1.salary = e2.salary AND e1.id <> e2.id;
Explanation
This query uses a self-join to compare each employee's salary with the salaries of other employees. The JOIN clause matches employees with the same salary and different IDs, effectively identifying employees who have the same salary as another employee.
Conclusion
SQL is a powerful language for managing and querying relational databases. Mastering these tough SQL problems and understanding their solutions will help you perform well in interviews and improve your ability to write efficient queries. Practice these problems regularly, and you'll be well-prepared for any SQL challenge you encounter.
No comments:
Post a Comment