Chapter 3 Visualizations
3.1 Traffic camera locations
Plotting data on map is easier than you’d think. If you have shapefiles, matplotlib
can be used to visualize geographical data.
Alternatively, you can download desired maps from opestreetmap.org as images. On the map area, Zoom in and out to have the
desired area bounding box. Take note of your coordinates [left, right, bottom, top]
.
Let's now import the required libraries as well as the dataset containing the geographical location (lon, lat) of each traffic camera.
# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
# Read csv file from Dropbox as Pandas DataFrame
csv = 'https://www.dropbox.com/s/xfazz0mu1z38buz/Singapore_cameras.csv?dl=1'
cameras = pd.read_csv(csv,delimiter=',')
# Read the map of Singapore
SG = plt.imread('./Singapore_map.png')
Our goal is to plot all cameras in the map of Singapore. We are going to label camera 1701
only, to see where exactly is located.
# Convert labes of camera_id as string
cameras['camera_id'] = cameras['camera_id'].astype(str)
cameras.loc[cameras['camera_id'] == '1701', 'labels'] = '1701'
cameras.loc[cameras['camera_id'] != '1701', 'labels'] = ''
# Open figure + axis
fig, ax = plt.subplots()
# Add the map of Singapore
# Use the extent keyword of imshow. The order of the argument is [left, right, bottom, top]
plt.imshow(SG, zorder=0, extent=[103.5846, 104.0419, 1.2002, 1.4829])
# plot
ax.scatter(x=cameras['lon'],y=cameras['lat'], color='red')
# set labels
ax.set_xlabel('lon')
ax.set_ylabel('lat')
# annotate points in axis
for idx, row in cameras.iterrows():
ax.annotate(row['labels'], (row['lon'], row['lat']) )
# force matplotlib to draw the graph
plt.show()
3.2Total number of vehicles per hour captured by a single camera
Let's plot the total number of vehicles per hour captured by camera 1701
. To do so, we have processed
a file with images retrieved every 5 minutes on Wed, November 11, 2019.
#Import libraries
import pandas as pd
import matplotlib.pyplot as plt
#Read csv file from Dropbox as Pandas DataFrame
csv = 'https://www.dropbox.com/s/qqncxtpnv83r8ex/1701Wednesday.csv?dl=1'
df = pd.read_csv(csv,delimiter=',')
We need to select only objects that are of class 3 (cars), 6 (truck) or 8 (bus). We take advantage of
the datetime
module that converts timestamps into datetime format, so we can extract days, hours, minutes, etc
df = df.loc[ (df['Detection_Class']==3) |
(df['Detection_Class']==6) |
(df['Detection_Class']==8) ] #Subset objects that are only vehicles
#Convert todatetime object
df['Timestamp_sg_time']= pd.to_datetime(df['Timestamp_sg_time'])
df['hour'] = df.Timestamp_sg_time.dt.hour #Extract hour of the day
df['ones'] = 1 #Ones to sum up the total number of vehicles
df_plot = df[['hour', 'ones']]
#Aggregate hour of the day and total number of cars
df_plot = df.groupby(['hour']).agg(vehicles=('ones', 'sum')).reset_index()
We plot the data by running the following lines:
plt.plot(df_plot['hour'], df_plot['vehicles'], "-o")
plt.title("Camera '1701' on Wed 13, November 2019")
plt.xlabel("Hour")
plt.ylabel("Total number of vehicles")
plt.show()
3.3 Advanced visualizations
The following map was created in kepler.
It displays the cumulative number of vehicles (represented by the size of the circles) captured by all cameras on Wed 13,
November 2019. The labels corresponds to the column row_number
of the dataset containing the location of the speed cameras.
Kepler is Uber's answer to geospatial analytics. Open-sourced in 2019 it was built with the end-user in mind,
creating a 'drag n' drop' system, being able to analyze millions of points in an instant.
The following plot was created in R
. It shows the total number of vehicles between Mon 11 - Sun 17, November 2019
captured by camera 1701
. We have droped objects with accuracy less than 0.75.
Interestingly, traffic patterns do not change much between weekends and weekdays.
Lastly, let's plot the total number of vehicles captured by camera 1701
every Wednesday between November 2019 and July 2020.
We have droped objects with accuracy less than 0.75.
We can notice a decrease in traffic flow during 2020 Singapore circuit breaker measures (3 April - 1 June, 2020).
You can try to run the code in the following Notebook.