Introduction to Polars#

Useful links: Getting Started: https://docs.pola.rs/user-guide/getting-started/

Migrating from Pandas: https://docs.pola.rs/user-guide/migration/pandas/

Why Polars? Parallelization; Lazy Evaluation; Libraries for Python, Rust, R; Soon support for using GPU

import polars as pl
from sklearn.datasets import load_iris

Loading Datasets into Polars DataFrames#

iris = load_iris()
#df = pl.DataFrame(iris)
df = pl.DataFrame({
    'sepal_length': iris.data[:, 0],
    'sepal_width': iris.data[:, 1],
    'petal_length': iris.data[:, 2],
    'petal_width': iris.data[:, 3],
    'species': iris.target
})
print(df)
print(df.schema)
print(df.describe())
print(df.columns)
print(df.shape)

Other methods of reading files in Polars#

https://docs.pola.rs/user-guide/io/

Column and Row Functions#

#Selecting columns
#df.select("species")
# Selecting Rows based on values
setosa_df = df.filter(pl.col("species") == 0)
print(setosa_df.describe())
# Multiple selection filters
print( df.filter(
    pl.col("species") == 2,
    pl.col("petal_length") < 5) )
setosa_df.select( pl.col("petal_length").sum() /  pl.col("petal_length").count() )
df.select(
    pl.col("petal_width","sepal_width"). mean()
)
#mean(axis=1) does not work
df.select(
    pl.col("sepal_length","sepal_width")
).mean_horizontal()
iris_df1 = df.with_columns(
    species_names=pl.col("species").replace_strict([0, 1, 2], ["Setosa", "Versicolor", "Virginica"])
)
print(iris_df1)

Group By Examples#

print( df.group_by(pl.col("species")) )
iris_df1.group_by(pl.col("species_names")).mean()

Visualization#

Works well with popular visualization libraries: https://docs.pola.rs/user-guide/misc/visualization/

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.scatter(
    x=df.select("sepal_width"),
    y=df.select("sepal_length"),
    c=df.select("species"),
)