๐Ÿ“๐Ÿ”: "A Deep Dive in How Slow SELECT * is


I have referred to this video by Sir Hussein Nasser:


Let's deep dive in, here the speaker explores the performance implications of using the "**SELECT " syntax in SQL queries.

Lets Go The Rock GIF - Lets Go The Rock Rock Go GIFs

While "SELECT " is a convenient way to retrieve all columns from a table without specifying each one individually, it has been widely criticized for its potential impact on query speed. In this article, we'll delve into the reasons behind the perceived slowness of "SELECT," shedding light on various factors that affect query performance.

Exploring the Technical Stack: To comprehend the performance implications, it's essential to grasp the underlying concepts in the technical stack. The speaker highlights the role of pages in row store databases and the fixed size of these pages. In row store databases, rows are stored in pages, which act as containers for the data. Each page typically consists of headers and content, which primarily includes rows or columns depending on the implementation.

Reading and Retrieving Data: When executing a query with "SELECT *," the database engine performs a full table scan, reading pages one by one to retrieve the desired data. The speaker explains the process of reading pages from disk, converting byte positions to pages, and the mechanics of page retrieval. Once the page containing the requested data is fetched, the database must deserialize the raw bytes into usable data structures, allocating additional memory.

Factors Affecting Performance: The video highlights several reasons why "SELECT *" queries can be slow. Firstly, the presence of index-only scans can significantly impact performance. While scanning an index may be faster, if the query requires additional columns not present in the index, the database must perform additional lookups, leading to slower execution.

Furthermore, not all columns are stored in-line within the pages. If a column contains large data, such as text, JSON, or blobs, databases often store them in separate tables, referencing the main table with a smaller pointer. Retrieving such columns requires additional CPU processing, memory allocation, and potential network transfers, depending on the architecture.

Additionally, the video highlights the impact of network latency and transmission size. Large result sets from the database can introduce network transmission delays, particularly when the data exceeds the maximum transmission unit (MTU). The serialization and deserialization process, along with the database's networking protocol, further contribute to the overall query execution time.

Optimizing Performance: To mitigate the performance issues associated with "SELECT," the speaker suggests alternative approaches. Considering the specific columns required for the query and avoiding unnecessary retrieval of large columns can significantly improve performance. Additionally, offloading processing tasks to the client-side and implementing efficient serialization techniques can help minimize the impact of "SELECT ."

Conclusion: While "SELECT *" provides a convenient way to retrieve all columns in a SQL query, its usage can have performance implications. Understanding the intricacies of the technical stack, the impact of data storage, network communication, and query optimization can help developers make informed decisions when designing database queries. By being mindful of the potential drawbacks, it is possible to strike a balance between convenience and performance in SQL operations.


"Signing Off with a Final Farewell! ๐ŸŒŸ Keep Learning, Keep Exploring! Goodbye for Now! ๐Ÿ‘‹โœจ"

You can connect with me on Twitter: https://twitter.com/aksingh_2000

LinkedIn: https://www.linkedin.com/in/ankitkumarsingh3009/

Goodbye Farewell GIF - Goodbye Farewell Bye Bye GIFs

ย