Slice Objects and DataFrames


When working with data from a pandas DataFrame, oftentimes we want to select a range of cells rather than specific ones. To do this, we can use slice objects.

Slice objects indicates how an object should be sliced by specifying:

  1. start - where the slice should begin
  2. stop - where the slice should end
  3. step - the width of the slice (e.g. slicing every second item, every third item, etc...)

To explore ways of slicing a DataFrame with slice objects, let's look at a DataFrame of instruments:

import pandas as pd

# Creating a DataFrame with "instrument", "family", and "material" columns
df = pd.DataFrame([
    {"instrument": "harp", "family": "strings", "material": "wood"}, 
    {"instrument": "violin", "family": "strings", "material": "spruce"}, 
    {"instrument": "guitar", "family": "strings", "material": "mahogany"},
    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"}, 
    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"}, 
    {"instrument": "flute", "family": "woodwind", "material": "silver"},
    {"instrument": "xylophone", "family": "percussion", "material": "wood"}, 
    {"instrument": "marimba", "family": "percussion", "material": "rosewood"}, 
    {"instrument": "trombone", "family": "brass", "material": "brass"}, 
    {"instrument": "trumpet", "family": "brass", "material": "brass"}])
df
instrumentfamilymaterial
0harpstringswood
1violinstringsspruce
2guitarstringsmahogany
3clarinetwoodwindmpingo
4recorderwoodwindboxwood
5flutewoodwindsilver
6xylophonepercussionwood
7marimbapercussionrosewood
8trombonebrassbrass
9trumpetbrassbrass
```
Creating a DataFrame of Instruments

Using Python's slice Function Within A DataFrame

One way to generate a slice object is with the slice function. There are three possible parameters: start, stop, and step. They follow the format:
    slice(start = 0, stop, step = 1)

If not specified, the start parameter is set to 0 by default and the step parameter is set to 1 by default. However, the stop parameter doesn't have any default value, so a value must be specified for the slice function to run:

slice(2)
slice(None, 2, None)
Creating a Slice Object With the Slice Function

To slice a DataFrame, enclose the slice object in brackets and place it after the DataFrame name:

df[slice(2)]
instrumentfamilymaterial
0harpstringswood
1violinstringsspruce
```
Slicing a DataFrame with Simple Slice Object. With start=0, end=2, and step=1 results in the first two rows being sliced out.

Remember that in almost all cases in Python, the stop value is not inclusive. Since we specified our stop value as 2, the output includes rows at index 0 and 1, but not the row at index 2.

Using the slice Function With All Three Parameters

Let's say we are interested in learning how to play an instrument, but have no prior musical experience. To start, we only want to look at every other non-string instrument in our DataFrame of instruments:

The first non-string instrument in df appears in the row at index 3 (our start value). We want to view every instrument up to and including the ninth row, so our stop value is 10. And, since we want to view every other instrument, our step value is 2.
df[slice(3,10,2)]
instrumentfamilymaterial
3clarinetwoodwindmpingo
5flutewoodwindsilver
7marimbapercussionrosewood
9trumpetbrassbrass
Slicing DataFrame With All Three Parameters

Using Slice Syntax Within a DataFrame

The more common way to generate a slice object is to use indexing syntax which follows the format:
x[start:stop:step] (where x can be a DataFrame, a list, etc...).

Indexing syntax is similar to the slice function. However, one notable difference for this format is that all three parameters are optional.

Using All Default Values in a DataFrame slice

Unlike the slice function, we don't have to specify any parameters when using indexing syntax. Not including any parameter is called an empty slice. Inputting an empty slice will output the entire DataFrame:

df[:]
instrumentfamilymaterial
0harpstringswood
1violinstringsspruce
2guitarstringsmahogany
3clarinetwoodwindmpingo
4recorderwoodwindboxwood
5flutewoodwindsilver
6xylophonepercussionwood
7marimbapercussionrosewood
8trombonebrassbrass
9trumpetbrassbrass
An Empty Slice Example

Specifying One Parameter in a DataFrame slice

Remember, the slice function requires at least the stop parameter.
With the indexing syntax, we have the option to specify a single parameter, and it can be either the start, stop, or step. The generic format is:
    df[start:stop], when one : is used
    df[start:stop:step], when two : are used

The default values remain the same (start=0 by default, step=1 by default) and the default value of end is the length of the sliced DataFrame. Example of one parameter:

# starts=7
# default values for `end` and `step=1`
# => slice contains only index 7 and larger:
df[7:]
instrumentfamilymaterial
7marimbapercussionrosewood
8trombonebrassbrass
9trumpetbrassbrass
Specifying Only The Start Parameter
# end=3
# default values for `start=0` and `step=1`
# => slice contains only index 0, 1, and 2:
df[:3]
instrumentfamilymaterial
0harpstringswood
1violinstringsspruce
2guitarstringsmahogany
Specifying Only The Stop Parameter
# step = 4
# default values for `start=0` and `end`
# => displays every 4th index, starting at 0: 0, 4, and 8
df[::4]
instrumentfamilymaterial
0harpstringswood
4recorderwoodwindboxwood
8trombonebrassbrass
Specifying Only The Step Parameter

Specifying Multiple Parameters in a DataFrame slice

Let's say that we are interested in learning one instrument from each family:

# start = 2, step = 2
# default value for `end`
# => every 2nd index, starting at index 2:
df[2::2]
instrumentfamilymaterial
2guitarstringsmahogany
4recorderwoodwindboxwood
6xylophonepercussionwood
8trombonebrassbrass
Slicing Every Other Row

After some consideration, we conclude that practicing four instruments requires too many hours of practice. Instead, we decide to narrow down our options to either a string instrument or a percussion instrument. To do this, we can combine two different slices of our DataFrame using concatenation:

# Concatenating two slices together
pd.concat([df[:3], df[6:8]])
instrumentfamilymaterial
0harpstringswood
1violinstringsspruce
2guitarstringsmahogany
6xylophonepercussionwood
7marimbapercussionrosewood
Using a Combination of Slice Objects

Using Negative Numbers in a slice

When generating slice objects, we can use negative numbers. For DataFrames, inputting a negative number will start from the bottom index and work upward.

For example, say we want to look at the last row of our DataFrame:

# This is equivalent to `df[9:]`
df[-1:]
instrumentfamilymaterial
9trumpetbrassbrass
Slices with Negative Numbers

Pandas Documentation

The pandas documentation includes 44-page guide covering all topics related to indexing and selecting data.