Run a Custom Function on Every Row in a DataFrame

We can use the apply function to run a function on every row of a DataFrame.

To explore the basics of this function, we'll look at a DataFrame of triangles:

import numpy as np
import pandas as pd

# Creates a DataFrame with 'side 1', 'side 2', and 'side 3' columns
df = pd.DataFrame([
  {'side 1': 4, 'side 2': 121, 'side 3': 25},
  {'side 1': 9, 'side 2': 36, 'side 3': 100},
  {'side 1': 100, 'side 2': 4, 'side 3': 49},
  {'side 1': 36, 'side 2': 25, 'side 3': 64},
side 1side 2side 3
Creating a DataFrame of Triangles

Using apply on a Series (using only one column of data)

The numpy library provides many mathematical functions that can be used with apply, including a function np.sqrt that returns the square root of an input.

Using df.apply and np.sqrt, we can create a new column to contain the square root of the first side of our triangle:

import numpy as np

# Create a new column to store the square root of `side 1`:
df["sqrt1"] = df["side 1"].apply(np.sqrt)
side 1side 2side 3sqrt1
Adding a new column with the square root of "side 1".

Creating Your Own Function for apply on a Series

When using one column of data, any function that takes one parameter as the value of your data can be used with apply. Here is a custom function that returns "small" or "large" depending on the input is 80 or larger that is then used in df.apply:

def isLarge(value):
  if value >= 80:
    return "large"
    return "small"

# Using our custom `isLarge` function, we add a new "side 1 size" column to our DataFrame:
df["side 1 size"] = df["side 1"].apply(isLarge)
side 1side 2side 3sqrt1side 1 size
Custom isLarge function with df.apply

Using apply on a DataFrame

Instead of using apply on a single column (a Series), we can also use apply on the whole DataFrame.

The default axis for applying the function is axis = 0 (applying the function to each column). To apply the function to each row, we specify axis = 1.

For example, let's find the perimeter of each triangle:

# Summing the columns of each row to find the perimeter
df[["side 1", "side 2", "side 3"]].apply(np.sum, axis = 1)
0    190
1    215
2    213
3    145
dtype: int64
Find the perimeter from a DataFrame of sides of a triangle

Creating Your Own Function for use with apply on a DataFrame

Similar to a using apply on a single column, we can create a custom function. Our custom function will receive a row of data every time the function is called instead of a single value. For example, to find the area of the triangle:

def findArea(row):
  s = (row["side 1"] + row["side 2"] + row["side 3"]) / 2
  return (s * (s - row["side 1"]) * (s - row["side 2"]) * (s - row["side 3"])) ** 0.5

df.apply(findArea, axis=1)
0     399.499687
1    1281.770626
2    1300.648276
3     786.486451
dtype: float64
Finding the area of a triangle using a custom Python function.

Pandas Documentation

Click Here for the full pandas documentation for the apply function