How to Accelerate SQL Server Data Fetching Using Apache Arrow in mssql-python
Introduction
Fetching a million rows from SQL Server into a Polars DataFrame used to mean creating a million Python objects, triggering countless garbage-collector allocations, and then discarding it all to build the DataFrame. That overhead is now gone. The mssql-python driver, thanks to a community contribution by Felix Graßl (@ffelixg), can fetch SQL Server data directly as Apache Arrow structures. This guide walks you through using Arrow support in mssql-python to make your data pipelines faster and more memory-efficient, whether you work with Polars, Pandas, DuckDB, or any other Arrow-native library.

What You Need
- Python 3.8+ installed on your system.
- mssql-python driver (version 1.2.0 or later) with Arrow support. Install via
pip install mssql-python[arrow]. - A target library that can consume Arrow data: Polars, Pandas (with
ArrowDtype), DuckDB, or similar. - SQL Server database with appropriate credentials and network access.
Step-by-Step Guide
Step 1: Install Required Packages
First, ensure you have the latest mssql-python package with Arrow support. Open your terminal and run:
pip install mssql-python[arrow] polarsThis installs both the driver and Polars as an example consumer. For Pandas or DuckDB, adjust accordingly.
Step 2: Import Libraries
Create a Python script and import the necessary modules:
import mssql
import polars as pl
from mssql import connectThe mssql module provides the driver; polars will be used to verify the zero-copy integration.
Step 3: Establish a Connection
Use your SQL Server credentials to create a connection. Replace placeholders with your actual server, database, username, and password:
connection = connect(
server="your_server.database.windows.net",
database="your_database",
username="your_username",
password="your_password"
)
cursor = connection.cursor()Step 4: Execute a Query and Fetch as Arrow
Execute a SELECT query. The key new method is fetcharrow(), which returns an Arrow Table directly—no Python objects per row are created:
cursor.execute("SELECT TOP 1000000 * FROM large_table")
arrow_table = cursor.fetcharrow()This call runs entirely in C++, writing values into contiguous, typed Arrow buffers. Nulls are tracked with a compact bitmap—no None objects per cell.
Step 5: Convert to Your DataFrame Library
Because Arrow uses a stable ABI (the Arrow C Data Interface), conversion is instantaneous—zero-copy:

- To Polars:
df = pl.from_arrow(arrow_table) - To Pandas:
df = arrow_table.to_pandas(types_mapper=pd.ArrowDtype) - To DuckDB:
duckdb.sql("SELECT * FROM arrow_table")
No serialization, no copies, no re-parsing—just a pointer exchange.
Step 6: Perform Operations Without Python Overhead
Once data is in an Arrow-native DataFrame, operations like filters, joins, and aggregations also work in-place on the same shared memory buffers. For example, in Polars:
result = df.filter(pl.col("datetime_col") > "2023-01-01").group_by("category").agg(pl.col("value").sum())No intermediate Python objects are materialized at any stage—this is the foundation for high-throughput pipelines.
Tips and Best Practices
- Speed gains are most noticeable with temporal types (like
DATETIMEandDATETIMEOFFSET) because per-value Python-side conversions are eliminated entirely. - Memory usage drops dramatically: A column of one million integers becomes a single contiguous C array, not a million individual Python objects. Great for large datasets.
- Seamless interoperability with other Arrow-native tools—Polars, Pandas (with
ArrowDtype), DuckDB, and even tools like Hugging Facedatasets—means you can mix and match libraries without conversion costs. - Ensure you're using a recent version of mssql-python (1.2.0+) to have the
fetcharrow()method available. - For best results, avoid fetching rows individually—always use bulk fetch methods like
fetcharrow()when working with Arrow. - Monitor your garbage collector; with Arrow, you'll see far less GC pressure, making your overall application more predictable.
Related Articles
- Chaos Engineering Meets AI: Why Intent-Driven Failure Testing Is the Next Breakthrough
- Constructing a High-Performance Knowledge Base for AI: A Step-by-Step Blueprint
- Polars vs Pandas: A Data Workflow Transformation - Q&A
- Silent Vibrations: The Hidden Cause of Unease in Old Buildings, Scientists Warn
- From 61 Seconds to 0.2: How Polars Revolutionized a Real Data Workflow
- Navigating Election Forecasting: Why Uncertainty Often Outweighs the Shock
- 10 Essential Insights into Building a Self-Healing RAG System
- Trump Targets Louisiana Incumbent: Endorses Julia Letlow to Oust 'Disloyal' Cassidy in GOP Primary