Interactive & Scrollable Choropleth Maps in Python with GeoPandas and Folium
by Caitlin Huxley
Choropleth maps are a powerful visualization tool for displaying geospatial data and are often used to show the distribution of a particular variable across a region. In this tutorial, I will walk you through the steps of creating interactive choropleth maps in Python using these two powerful libraries.
This is going to be a 2-in-1 walkthrough of how to use older precinct-level shapefiles and their election results to calculate what percentage of the vote each candidate would have received in today's districts AND how to turn that into a sexy-looking choropleth map like the one below:
You can scroll in and out and hover over a congressional district to see a pop-up tooltip. I've recently moved to Pennsylvania, so I chose this state. (I also made State Senate and State Rep)
For the purposes of this tutorial, I've only included the 2020 Trump/Biden results and colored the map based on those numbers, but you can add in anything you have the precinct-level results for (as long as you have the shape files to go along with them). Luckily, Harvard has made available the precinct-level shape files for most states here. Now let's get started.
Step 1: Import necessary libraries
To get started, we need to import the necessary libraries. In this case, we will be using Pandas, GeoPandas, and Folium.
- Pandas is a powerful library for data manipulation and analysis in Python, using a DataFrame - like an excel sheet, but way more powerful.
- GeoPandas is an extension of Pandas that allows for working with geospatial data. Note that this requires some extra effort to install and cannot be installed in your root environment.
- Folium is a library for creating interactive maps in Python. It is built on top of the popular mapping library Leaflet.js and allows for the easy creation of map visualizations with various base maps, markers, and other features.
- The ColorMap class from the Branca library is used to create custom color maps for our Folium maps.
Step 2: Read in shapefiles and convert the CRS
Next, we need to read in the necessary shapefiles.
Step 3: Calculate relative areas of precincts
Next, convert their coordinate reference system (CRS) to EPSG 3857 (measured in meters - important because we will be figuring out the percentage of the 2020 precincts which overlap each 2022 district).
Note that it is important to ensure that all of the data we are using has the same CRS, as this allows us to accurately compare and analyze the spatial data. So we will have to convert the rest later.
Then we can calculate the total area in meters of each precinct within the district and join that new column back into the Precincts20 GeoDataFrame (we'll call it GDF).
Step 4: Loop through each type of district, cleaning up a bit
Here we will loop through each of the districts (Congressional, State Senate, and State House) and calculate precinct-level data for each district.
To begin, we define a list called DistsList containing the district GDFs and their corresponding names. Then begin a for loop.
Note that DistList is a list of lists, and so Dists actually calls [CDDists, 'CD'] - the name of the GDF, and the simple version of the name we'd like to use when saving files later. Most of the rest of the tutorial will use Dists[0] for everything (remember that the numbering system in python starts at 0, not 1. So 0 is the 1st item in the list)
Inside the for loop, we'll convert the CRS of the district GDF to EPSG 3857 as we did above to the Precinct20 DataFrame. Then trim it down to include only the relevant columns (in this case, the 'DISTRICT' column and the vote totals for each candidate).
I also ran into an issue where the District shape files already included a column named NAME, which was horribly inconvenient. So, I had to include a line checking if it existed and if it did, naming it something else.
Then we'll start ANOTHER for loop, this time looping through for each unique district number in the DISTRICT column of that Dists[0] GDF.
Step 5: Nested For Loop through the district, calculating the area of each precinct WITHIN that district
Once this bit of housekeeping is done, we'll create a nested For loop to iterate through each of the districts in Congress (then in House, and Senate), to calculate the percentage that falls into each district.
We'll create a new GDF called DistFocus20, which uses the overlay() function from the GeoPandas library to calculate the intersection of the current district (Dists[0]) with Precincts20. This should now only contain the data for the precincts within the current district. The how parameter specifies that we want to calculate the intersection of the two GDFs, and the keep_geom_type parameter specifies that we want to keep the geometry type of the original.
We can calculate the total area in meters of the whole district, and join that new column back in to DistFocus20, then merge Precincts20 into that.
Now we have both the area for the precinct and the district as a whole, we can calculate the relative area of each precinct within that district. "DistFocus20[Dists[1] + '_' + i + '%']" looks funky, but if you read slowly, it's easy enough to see how python iteratively creates new columns for each of "CD_1%", "CD_2%", "CD_3%", etc. and assigns to it the calculation of area of each precinct, divided by the area of the entire district.
Similarly, we can then iterate through the list of candidates for president in PA (just Biden, Trump & Jorgensen), using another for loop to iterate through the votes20 list and calculate the votes for each candidate within the current district. This is done by multiplying the relative precinct area within the district (DistFocus20[Dists[1] + '_' + i + '%'] - or "CD_1%") by the total votes for the candidate in the precinct (DistFocus20[v]).
The resulting values are added as new columns and represent the relative vote totals each 2020 candidate received in the new 2022 districts. Amazing!
Step 6: Working with precinct-level vote totals for each district
Now that we've got a raw number, we need to pretty it up, so it can be presented in our tooltip and used as a scale to color our Choropleths.
- The first line of code defines a variable called
PresTotal20
that represents the total number of votes for the presidential race within the current district. - The second and third lines define variables called
PresRep20
andPresDem20
which represent the total number of votes for the Republican and Democratic candidates, respectively. - The fourth through sixth lines calculate the percentage of votes received by the Republican and Democratic candidates, respectively. This is done by dividing the total number of votes for each candidate by the total number of votes in the district. Then subtract the Republican vote percentage from the Democratic vote percentage to get the Spread. This Spread is the column we'll be using to color our map.
- Lines 12-20 create a smaller results dataframe with just our topline data that we can merge to the district list later.
- 22-25 turns them into some good-looking strings instead of endless digits. (25.36% instead of 0.25368583181351..., etc.) We'll use these later in the tooltip.
- Finally, we concat those results back into the FinalResults DataFrame.
Step 7: Merging the district results into the main district data
Now that we've iterated over every district #, and we're back in the first for loop, we just merge the FinalResults data to the main Dists[0] GDF.
Now we can make some maps!
Step 8: Makin maps!
Now, when I was first learning this, I used this tutorial here, which is a great start, but didn't really give me everything I needed. It's a good place to start, though.
We'll create a new folium map, called FMap, pick the center of our state (you can just right-click on google maps to get these coords), and set our zoom to 7 (guess and check works best here, sorry). Then we create a Basemap to put our colors and shapes on, add a tile set (OpenStreetMap is free, and GoogleMaps is not, so we'll use that.), and then add that base_map in to FMap.
So we can use them in our choropleths later, we'll save the geometry column (which contains the polygons) as a GeoJSON representation of the district boundaries for each district. We set the District# column as the index, so it knows what we're referencing when we try to color it later.
Step 9: ColorMapping
Here, we are creating a colormap to use in our choropleth map of the districts. A colormap is a sequence of colors that is used to represent the values of a continuous variable in a graph or map. To create the colormap, we are using the LinearColormap() constructor from the branca.colormap module. This constructor takes three arguments: colors, vmin, and vmax.
The colors argument is a list of colors that will be used in the colormap. In this case, we are using a list with two colors: 'red' and 'blue'. These colors will be used to represent the minimum and maximum values of the continuous variable we are mapping.
The vmin and vmax arguments specify the minimum and maximum values of the continuous variable that we want to map. In this case, we are using the min() and max() methods of the Dists[0] DataFrame to determine the minimum and maximum values of the p[0] column.
Step 10: Choropleths
- The geo_data argument is a GeoJSON representation of the geographical features that we want to map. We'll assign the geo variable we created earlier.
- The data argument is a Pandas DataFrame that contains the data that we want to visualize on the map. We are obviously using the Dists[0] DataFrame, which contains the district boundaries and our election results we appended.
- The columns argument is a list of the column names in the data DataFrame that we want to use to create the choropleth map. We are using the DISTRICT column (which specifies the district ID) and the 2020Pres_Spread column (which contains the values of the continuous variable that we want to map).
- The key_on argument specifies how the data in the data DataFrame should be mapped to the GeoJSON data in the geo_data argument. Do not touch it.
- The fill_opacity and line_opacity arguments specify the transparency of the fill and line styles of the districts on the map. In this case, we are setting both values to 0 to make the districts fully transparent.
- The smooth_factor argument specifies the smoothness of the district boundaries on the map. In this case, we are setting the value to 0 to disable.
Step 11: Hack in our colormap (plus a little mystery)
This section changes some of the styles from the CPLTH above so that you can actually use a colormap. If you skip this part of the code, your shapes will be invisible.
Full disclosure: it was 4am when I wrote this section, and I'm not 100% sure how it works. Sorry, but unless you REALLY know what you're doing, just don't play with this section, except to change the colors maybe. If you think you know how it works, please feel free to shoot me a line and explain it.
Anyway... we just want to define the style function we want for our choropleths. The weight property specifies the thickness of the district boundaries, the color property specifies the color of the district boundaries, the fillColor property specifies the fill color of the district, and the fillOpacity property specifies the transparency of the fill color.
The second line of code uses the GeoJson() constructor from the folium module to create a GeoJSON object that represents the district boundaries, and the third line of code iterates over the children of the CPLTH object (which represents the choropleth map) and removes any child that starts with the string "color_map". This is done to remove the color bar from the legend, as it is not needed for this map.
Step 12: The hover effect, and a mouseover tooltip
This portion of the code is responsible for adding a hover function to the map. When the user hovers their cursor over a district on the map, a pop-up window will appear displaying the district number, the percentage of votes for each political party, and the spread between the two parties.
To do this, the code first defines the style for the overlay layer, which we set to be completely transparent, so it does not interfere with the appearance of the map. Next, we define the highlight function, which sets the fill color and opacity to be partially transparent white.
Finally, the code creates a GeoJSON tooltip using the defined style function and highlight function. The tooltip is given fields to display and aliases for those fields and is added as a child to the map, which are just the strings we created earlier. The map is then told to keep the tooltip in front of other layers so that it is always visible to the user.
Final Step: Save it!
Should be self explanatory, at this point, but this is all you have to do to save it.