Slice Objects and DataFrames


When working with data from a pandas DataFrame, oftentimes we want to select a range of cells rather than specific ones. To do this, we can use slice objects.

Slice objects indicates how an object should be sliced by specifying:

  1. start - where the slice should begin
  2. stop - where the slice should end
  3. step - the width of the slice (e.g. slicing every second item, every third item, etc...)

To explore ways of slicing a DataFrame with slice objects, let's look at a DataFrame of instruments:

Reset Code Python Output:
instrument family material
0 harp strings wood
1 violin strings spruce
2 guitar strings mahogany
3 clarinet woodwind mpingo
4 recorder woodwind boxwood
5 flute woodwind silver
6 xylophone percussion wood
7 marimba percussion rosewood
8 trombone brass brass
9 trumpet brass brass
```

Using Python's slice Function Within A DataFrame

One way to generate a slice object is with the slice function. There are three possible parameters: start, stop, and step. They follow the format:
    slice(start = 0, stop, step = 1)

If not specified, the start parameter is set to 0 by default and the step parameter is set to 1 by default. However, the stop parameter doesn't have any default value, so a value must be specified for the slice function to run:

Reset Code Python Output:
```
slice(None, 2, None)
```

To slice a DataFrame, enclose the slice object in brackets and place it after the DataFrame name:

Reset Code Python Output:
instrument family material
0 harp strings wood
1 violin strings spruce
```

Remember that in almost all cases in Python, the stop value is not inclusive. Since we specified our stop value as 2, the output includes rows at index 0 and 1, but not the row at index 2.

Using the slice Function With All Three Parameters

Let's say we are interested in learning how to play an instrument, but have no prior musical experience. To start, we only want to look at every other non-string instrument in our DataFrame of instruments:

The first non-string instrument in df appears in the row at index 3 (our start value). We want to view every instrument up to and including the ninth row, so our stop value is 10. And, since we want to view every other instrument, our step value is 2.
Reset Code Python Output:
instrument family material
3 clarinet woodwind mpingo
5 flute woodwind silver
7 marimba percussion rosewood
9 trumpet brass brass

Using Slice Syntax Within a DataFrame

The more common way to generate a slice object is to use indexing syntax which follows the format:
x[start:stop:step] (where x can be a DataFrame, a list, etc...).

Indexing syntax is similar to the slice function. However, one notable difference for this format is that all three parameters are optional.

Using All Default Values in a DataFrame slice

Unlike the slice function, we don't have to specify any parameters when using indexing syntax. Not including any parameter is called an empty slice. Inputting an empty slice will output the entire DataFrame:

Reset Code Python Output:
instrument family material
0 harp strings wood
1 violin strings spruce
2 guitar strings mahogany
3 clarinet woodwind mpingo
4 recorder woodwind boxwood
5 flute woodwind silver
6 xylophone percussion wood
7 marimba percussion rosewood
8 trombone brass brass
9 trumpet brass brass
```

Specifying One Parameter in a DataFrame slice

Remember, the slice function requires at least the stop parameter.
With the indexing syntax, we have the option to specify a single parameter, and it can be either the start, stop, or step. The generic format is:
    df[start:stop], when one : is used
    df[start:stop:step], when two : are used

The default values remain the same (start=0 by default, step=1 by default) and the default value of end is the length of the sliced DataFrame. Example of one parameter:

Reset Code Python Output:
instrument family material
7 marimba percussion rosewood
8 trombone brass brass
9 trumpet brass brass
Reset Code Python Output:
instrument family material
0 harp strings wood
1 violin strings spruce
2 guitar strings mahogany
Reset Code Python Output:
instrument family material
0 harp strings wood
4 recorder woodwind boxwood
8 trombone brass brass
```

Specifying Multiple Parameters in a DataFrame slice

Let's say that we are interested in learning one instrument from each family:

Reset Code Python Output:
instrument family material
2 guitar strings mahogany
4 recorder woodwind boxwood
6 xylophone percussion wood
8 trombone brass brass

After some consideration, we conclude that practicing four instruments requires too many hours of practice. Instead, we decide to narrow down our options to either a string instrument or a percussion instrument. To do this, we can combine two different slices of our DataFrame using concatenation:

Reset Code Python Output:
instrument family material
0 harp strings wood
1 violin strings spruce
2 guitar strings mahogany
6 xylophone percussion wood
7 marimba percussion rosewood

Using Negative Numbers in a slice

When generating slice objects, we can use negative numbers. For DataFrames, inputting a negative number will start from the bottom index and work upward.

For example, say we want to look at the last row of our DataFrame:

Reset Code Python Output:
instrument family material
9 trumpet brass brass

Pandas Documentation

The pandas documentation includes 44-page guide covering all topics related to indexing and selecting data.