Introduction to Polars#
Useful links: Getting Started: https://docs.pola.rs/user-guide/getting-started/
Migrating from Pandas: https://docs.pola.rs/user-guide/migration/pandas/
Why Polars? Parallelization; Lazy Evaluation; Libraries for Python, Rust, R; Soon support for using GPU
import polars as pl
from sklearn.datasets import load_iris
Loading Datasets into Polars DataFrames#
iris = load_iris()
#df = pl.DataFrame(iris)
df = pl.DataFrame({
'sepal_length': iris.data[:, 0],
'sepal_width': iris.data[:, 1],
'petal_length': iris.data[:, 2],
'petal_width': iris.data[:, 3],
'species': iris.target
})
print(df)
print(df.schema)
print(df.describe())
print(df.columns)
print(df.shape)
Other methods of reading files in Polars#
Column and Row Functions#
#Selecting columns
#df.select("species")
# Selecting Rows based on values
setosa_df = df.filter(pl.col("species") == 0)
print(setosa_df.describe())
# Multiple selection filters
print( df.filter(
pl.col("species") == 2,
pl.col("petal_length") < 5) )
setosa_df.select( pl.col("petal_length").sum() / pl.col("petal_length").count() )
df.select(
pl.col("petal_width","sepal_width"). mean()
)
#mean(axis=1) does not work
df.select(
pl.col("sepal_length","sepal_width")
).mean_horizontal()
iris_df1 = df.with_columns(
species_names=pl.col("species").replace_strict([0, 1, 2], ["Setosa", "Versicolor", "Virginica"])
)
print(iris_df1)
Group By Examples#
print( df.group_by(pl.col("species")) )
iris_df1.group_by(pl.col("species_names")).mean()
Visualization#
Works well with popular visualization libraries: https://docs.pola.rs/user-guide/misc/visualization/
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(
x=df.select("sepal_width"),
y=df.select("sepal_length"),
c=df.select("species"),
)