Slice Objects and DataFrames

When working with data from a pandas DataFrame, oftentimes we want to select a range of cells rather than specific ones. To do this, we can use slice objects.

Slice objects indicates how an object should be sliced by specifying:

start - where the slice should begin
stop - where the slice should end
step - the width of the slice (e.g. slicing every second item, every third item, etc...)

To explore ways of slicing a DataFrame with slice objects, let's look at a DataFrame of instruments:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\ndf

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      0
      harp
      strings
      wood
    
    
      1
      violin
      strings
      spruce
    
    
      2
      guitar
      strings
      mahogany
    
    
      3
      clarinet
      woodwind
      mpingo
    
    
      4
      recorder
      woodwind
      boxwood
    
    
      5
      flute
      woodwind
      silver
    
    
      6
      xylophone
      percussion
      wood
    
    
      7
      marimba
      percussion
      rosewood
    
    
      8
      trombone
      brass
      brass
    
    
      9
      trumpet
      brass
      brass
    
  

```

	instrument	family	material
0	harp	strings	wood
1	violin	strings	spruce
2	guitar	strings	mahogany
3	clarinet	woodwind	mpingo
4	recorder	woodwind	boxwood
5	flute	woodwind	silver
6	xylophone	percussion	wood
7	marimba	percussion	rosewood
8	trombone	brass	brass
9	trumpet	brass	brass

Using Python's `slice` Function Within A DataFrame

One way to generate a slice object is with the slice function. There are three possible parameters: start, stop, and step. They follow the format:
slice(start = 0, stop, step = 1)

If not specified, the start parameter is set to 0 by default and the step parameter is set to 1 by default. However, the stop parameter doesn't have any default value, so a value must be specified for the slice function to run:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\nslice(2)

Reset Code Python Output:

```
slice(None, 2, None)
```

To slice a DataFrame, enclose the slice object in brackets and place it after the DataFrame name:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\ndf[slice(2)]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      0
      harp
      strings
      wood
    
    
      1
      violin
      strings
      spruce
    
  

```

	instrument	family	material
0	harp	strings	wood
1	violin	strings	spruce

Remember that in almost all cases in Python, the stop value is not inclusive. Since we specified our stop value as 2, the output includes rows at index 0 and 1, but not the row at index 2.

Using the `slice` Function With All Three Parameters

Let's say we are interested in learning how to play an instrument, but have no prior musical experience. To start, we only want to look at every other non-string instrument in our DataFrame of instruments:

The first non-string instrument in df appears in the row at index 3 (our start value). We want to view every instrument up to and including the ninth row, so our stop value is 10. And, since we want to view every other instrument, our step value is 2.

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\ndf[slice(3,10,2)]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      3
      clarinet
      woodwind
      mpingo
    
    
      5
      flute
      woodwind
      silver
    
    
      7
      marimba
      percussion
      rosewood
    
    
      9
      trumpet
      brass
      brass

	instrument	family	material
3	clarinet	woodwind	mpingo
5	flute	woodwind	silver
7	marimba	percussion	rosewood
9	trumpet	brass	brass

Using Slice Syntax Within a DataFrame

The more common way to generate a slice object is to use indexing syntax which follows the format:
x[start:stop:step] (where x can be a DataFrame, a list, etc...).

Indexing syntax is similar to the slice function. However, one notable difference for this format is that all three parameters are optional.

Using All Default Values in a DataFrame slice

Unlike the slice function, we don't have to specify any parameters when using indexing syntax. Not including any parameter is called an empty slice. Inputting an empty slice will output the entire DataFrame:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\ndf[:]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      0
      harp
      strings
      wood
    
    
      1
      violin
      strings
      spruce
    
    
      2
      guitar
      strings
      mahogany
    
    
      3
      clarinet
      woodwind
      mpingo
    
    
      4
      recorder
      woodwind
      boxwood
    
    
      5
      flute
      woodwind
      silver
    
    
      6
      xylophone
      percussion
      wood
    
    
      7
      marimba
      percussion
      rosewood
    
    
      8
      trombone
      brass
      brass
    
    
      9
      trumpet
      brass
      brass
    
  

```

	instrument	family	material
0	harp	strings	wood
1	violin	strings	spruce
2	guitar	strings	mahogany
3	clarinet	woodwind	mpingo
4	recorder	woodwind	boxwood
5	flute	woodwind	silver
6	xylophone	percussion	wood
7	marimba	percussion	rosewood
8	trombone	brass	brass
9	trumpet	brass	brass

Specifying One Parameter in a DataFrame slice

Remember, the slice function requires at least the stop parameter.
With the indexing syntax, we have the option to specify a single parameter, and it can be either the start, stop, or step. The generic format is:
df[start:stop], when one : is used
df[start:stop:step], when two : are used

The default values remain the same (start=0 by default, step=1 by default) and the default value of end is the length of the sliced DataFrame. Example of one parameter:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\n# starts=7\n# default values for `end` and `step=1`\n# => slice contains only index 7 and larger:\ndf[7:]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      7
      marimba
      percussion
      rosewood
    
    
      8
      trombone
      brass
      brass
    
    
      9
      trumpet
      brass
      brass

	instrument	family	material
7	marimba	percussion	rosewood
8	trombone	brass	brass
9	trumpet	brass	brass

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\n# end=3\n# default values for `start=0` and `step=1`\n# => slice contains only index 0, 1, and 2:\ndf[:3]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      0
      harp
      strings
      wood
    
    
      1
      violin
      strings
      spruce
    
    
      2
      guitar
      strings
      mahogany

	instrument	family	material
0	harp	strings	wood
1	violin	strings	spruce
2	guitar	strings	mahogany

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\n# step = 4\n# default values for `start=0` and `end`\n# => displays every 4th index, starting at 0: 0, 4, and 8\ndf[::4]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      0
      harp
      strings
      wood
    
    
      4
      recorder
      woodwind
      boxwood
    
    
      8
      trombone
      brass
      brass
    
  

```

	instrument	family	material
0	harp	strings	wood
4	recorder	woodwind	boxwood
8	trombone	brass	brass

Specifying Multiple Parameters in a DataFrame slice

Let's say that we are interested in learning one instrument from each family:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\n# start = 2, step = 2\n# default value for `end`\n# => every 2nd index, starting at index 2:\ndf[2::2]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      2
      guitar
      strings
      mahogany
    
    
      4
      recorder
      woodwind
      boxwood
    
    
      6
      xylophone
      percussion
      wood
    
    
      8
      trombone
      brass
      brass

	instrument	family	material
2	guitar	strings	mahogany
4	recorder	woodwind	boxwood
6	xylophone	percussion	wood
8	trombone	brass	brass

After some consideration, we conclude that practicing four instruments requires too many hours of practice. Instead, we decide to narrow down our options to either a string instrument or a percussion instrument. To do this, we can combine two different slices of our DataFrame using concatenation:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\n# Concatenating two slices together\npd.concat([df[:3], df[6:8]])

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      0
      harp
      strings
      wood
    
    
      1
      violin
      strings
      spruce
    
    
      2
      guitar
      strings
      mahogany
    
    
      6
      xylophone
      percussion
      wood
    
    
      7
      marimba
      percussion
      rosewood

	instrument	family	material
0	harp	strings	wood
1	violin	strings	spruce
2	guitar	strings	mahogany
6	xylophone	percussion	wood
7	marimba	percussion	rosewood

Using Negative Numbers in a slice

When generating slice objects, we can use negative numbers. For DataFrames, inputting a negative number will start from the bottom index and work upward.

For example, say we want to look at the last row of our DataFrame:

import pandas as pd\n&nbsp;\n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n    {"instrument": "harp", "family": "strings", "material": "wood"},\n    {"instrument": "violin", "family": "strings", "material": "spruce"},\n    {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n    {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n    {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n    {"instrument": "flute", "family": "woodwind", "material": "silver"},\n    {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n    {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n    {"instrument": "trombone", "family": "brass", "material": "brass"},\n    {"instrument": "trumpet", "family": "brass", "material": "brass"}])\n# This is equivalent to `df[9:]`\ndf[-1:]

Reset Code Python Output:


  
    
      
      instrument
      family
      material
    
  
  
    
      9
      trumpet
      brass
      brass

	instrument	family	material
9	trumpet	brass	brass

Pandas Documentation

The pandas documentation includes 44-page guide covering all topics related to indexing and selecting data.