Run a Custom Function on Every Row in a DataFrame
We can use the apply
function to run a function on every row of a DataFrame.
To explore the basics of this function, we'll look at a DataFrame of triangles:
import numpy as np
import pandas as pd
# Creates a DataFrame with 'side 1', 'side 2', and 'side 3' columns
df = pd.DataFrame([
{'side 1': 4, 'side 2': 121, 'side 3': 25},
{'side 1': 9, 'side 2': 36, 'side 3': 100},
{'side 1': 100, 'side 2': 4, 'side 3': 49},
{'side 1': 36, 'side 2': 25, 'side 3': 64},
])
df
side 1 | side 2 | side 3 | |
---|---|---|---|
0 | 4 | 121 | 25 |
1 | 9 | 36 | 100 |
2 | 100 | 4 | 49 |
3 | 36 | 25 | 64 |
Using apply
on a Series (using only one column of data)
The numpy
library provides many mathematical functions that can be used with apply
, including a function np.sqrt
that returns the square root of an input.
Using df.apply
and np.sqrt
, we can create a new column to contain the square root of the first side of our triangle:
import numpy as np
# Create a new column to store the square root of `side 1`:
df["sqrt1"] = df["side 1"].apply(np.sqrt)
df
side 1 | side 2 | side 3 | sqrt1 | |
---|---|---|---|---|
0 | 94 | 71 | 25 | 9.695360 |
1 | 79 | 36 | 100 | 8.888194 |
2 | 100 | 64 | 49 | 10.000000 |
3 | 36 | 45 | 64 | 6.000000 |
Creating Your Own Function for apply
on a Series
When using one column of data, any function that takes one parameter as the value of your data can be used with apply
. Here is a custom function that returns "small" or "large" depending on the input is 80 or larger that is then used in df.apply
:
def isLarge(value):
if value >= 80:
return "large"
else:
return "small"
# Using our custom `isLarge` function, we add a new "side 1 size" column to our DataFrame:
df["side 1 size"] = df["side 1"].apply(isLarge)
df
side 1 | side 2 | side 3 | sqrt1 | side 1 size | |
---|---|---|---|---|---|
0 | 94 | 71 | 25 | 9.695360 | large |
1 | 79 | 36 | 100 | 8.888194 | small |
2 | 100 | 64 | 49 | 10.000000 | large |
3 | 36 | 45 | 64 | 6.000000 | small |
Using apply
on a DataFrame
Instead of using apply
on a single column (a Series
), we can also use apply
on the whole DataFrame.
The default axis for applying the function is axis = 0
(applying the function to each column). To apply the function to each row, we specify axis = 1
.
For example, let's find the perimeter of each triangle:
Creating Your Own Function for use with apply
on a DataFrame
Similar to a using apply
on a single column, we can create a custom function. Our custom function will receive a row of data every time the function is called instead of a single value. For example, to find the area of the triangle:
Pandas Documentation
Click Here for the full pandas documentation for the apply
function