Slice Objects and DataFrames
When working with data from a pandas DataFrame, oftentimes we want to select a range of cells rather than specific ones. To do this, we can use slice objects.
Slice objects indicates how an object should be sliced by specifying:
start
- where the slice should beginstop
- where the slice should endstep
- the width of the slice (e.g. slicing every second item, every third item, etc...)
To explore ways of slicing a DataFrame with slice objects, let's look at a DataFrame of instruments:
import pandas as pd
# Creating a DataFrame with "instrument", "family", and "material" columns
df = pd.DataFrame([
{"instrument": "harp", "family": "strings", "material": "wood"},
{"instrument": "violin", "family": "strings", "material": "spruce"},
{"instrument": "guitar", "family": "strings", "material": "mahogany"},
{"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},
{"instrument": "recorder", "family": "woodwind", "material": "boxwood"},
{"instrument": "flute", "family": "woodwind", "material": "silver"},
{"instrument": "xylophone", "family": "percussion", "material": "wood"},
{"instrument": "marimba", "family": "percussion", "material": "rosewood"},
{"instrument": "trombone", "family": "brass", "material": "brass"},
{"instrument": "trumpet", "family": "brass", "material": "brass"}])
df
instrument | family | material | |
---|---|---|---|
0 | harp | strings | wood |
1 | violin | strings | spruce |
2 | guitar | strings | mahogany |
3 | clarinet | woodwind | mpingo |
4 | recorder | woodwind | boxwood |
5 | flute | woodwind | silver |
6 | xylophone | percussion | wood |
7 | marimba | percussion | rosewood |
8 | trombone | brass | brass |
9 | trumpet | brass | brass |
Using Python's slice
Function Within A DataFrame
One way to generate a slice object is with the slice
function. There are three possible parameters: start, stop, and step. They follow the format:
slice(start = 0, stop, step = 1)
If not specified, the start
parameter is set to 0
by default and the step
parameter is set to 1
by default. However, the stop
parameter doesn't have any default value, so a value must be specified for the slice
function to run:
To slice a DataFrame, enclose the slice object in brackets and place it after the DataFrame name:
df[slice(2)]
instrument | family | material | |
---|---|---|---|
0 | harp | strings | wood |
1 | violin | strings | spruce |
Remember that in almost all cases in Python, the stop value is not inclusive. Since we specified our stop value as 2, the output includes rows at index 0 and 1, but not the row at index 2.
Using the slice
Function With All Three Parameters
Let's say we are interested in learning how to play an instrument, but have no prior musical experience. To start, we only want to look at every other non-string instrument in our DataFrame of instruments:
df
appears in the row at index 3 (our start value). We want to view every instrument up to and including the ninth row, so our stop value is 10. And, since we want to view every other instrument, our step value is 2.df[slice(3,10,2)]
instrument | family | material | |
---|---|---|---|
3 | clarinet | woodwind | mpingo |
5 | flute | woodwind | silver |
7 | marimba | percussion | rosewood |
9 | trumpet | brass | brass |
Using Slice Syntax Within a DataFrame
The more common way to generate a slice object is to use indexing syntax which follows the format:x[start:stop:step]
(where x
can be a DataFrame, a list, etc...).
Indexing syntax is similar to the slice
function. However, one notable difference for this format is that all three parameters are optional.
Using All Default Values in a DataFrame slice
Unlike the slice
function, we don't have to specify any parameters when using indexing syntax. Not including any parameter is called an empty slice. Inputting an empty slice will output the entire DataFrame:
df[:]
instrument | family | material | |
---|---|---|---|
0 | harp | strings | wood |
1 | violin | strings | spruce |
2 | guitar | strings | mahogany |
3 | clarinet | woodwind | mpingo |
4 | recorder | woodwind | boxwood |
5 | flute | woodwind | silver |
6 | xylophone | percussion | wood |
7 | marimba | percussion | rosewood |
8 | trombone | brass | brass |
9 | trumpet | brass | brass |
Specifying One Parameter in a DataFrame slice
Remember, the slice
function requires at least the stop parameter.
With the indexing syntax, we have the option to specify a single parameter, and it can be either the start, stop, or step. The generic format is:
df[start:stop]
, when one :
is used
df[start:stop:step]
, when two :
are used
The default values remain the same (start=0
by default, step=1
by default) and the default value of end
is the length of the sliced DataFrame. Example of one parameter:
# starts=7
# default values for `end` and `step=1`
# => slice contains only index 7 and larger:
df[7:]
instrument | family | material | |
---|---|---|---|
7 | marimba | percussion | rosewood |
8 | trombone | brass | brass |
9 | trumpet | brass | brass |
# end=3
# default values for `start=0` and `step=1`
# => slice contains only index 0, 1, and 2:
df[:3]
instrument | family | material | |
---|---|---|---|
0 | harp | strings | wood |
1 | violin | strings | spruce |
2 | guitar | strings | mahogany |
# step = 4
# default values for `start=0` and `end`
# => displays every 4th index, starting at 0: 0, 4, and 8
df[::4]
instrument | family | material | |
---|---|---|---|
0 | harp | strings | wood |
4 | recorder | woodwind | boxwood |
8 | trombone | brass | brass |
Specifying Multiple Parameters in a DataFrame slice
Let's say that we are interested in learning one instrument from each family:
# start = 2, step = 2
# default value for `end`
# => every 2nd index, starting at index 2:
df[2::2]
instrument | family | material | |
---|---|---|---|
2 | guitar | strings | mahogany |
4 | recorder | woodwind | boxwood |
6 | xylophone | percussion | wood |
8 | trombone | brass | brass |
After some consideration, we conclude that practicing four instruments requires too many hours of practice. Instead, we decide to narrow down our options to either a string instrument or a percussion instrument. To do this, we can combine two different slices of our DataFrame using concatenation:
# Concatenating two slices together
pd.concat([df[:3], df[6:8]])
instrument | family | material | |
---|---|---|---|
0 | harp | strings | wood |
1 | violin | strings | spruce |
2 | guitar | strings | mahogany |
6 | xylophone | percussion | wood |
7 | marimba | percussion | rosewood |
Using Negative Numbers in a slice
When generating slice objects, we can use negative numbers. For DataFrames, inputting a negative number will start from the bottom index and work upward.
For example, say we want to look at the last row of our DataFrame:
# This is equivalent to `df[9:]`
df[-1:]
instrument | family | material | |
---|---|---|---|
9 | trumpet | brass | brass |
Pandas Documentation
The pandas documentation includes 44-page guide covering all topics related to indexing and selecting data.