Shift Down



The shift() method for a pandas series helps shift values in a column up or down. This is similar to using the SQL window functions for LAG() and LEAD(). You can learn about these SQL window functions via Mode's SQL tutorial.

  1. Shift Downtime
  2. Shift Download
  3. Shift Down Words
  4. Excel Ctrl Shift Down

For the raven method, you'll need to lie down in the starfish position when you're half asleep and count down from 100. You then have to imagine your desired reality, or if you have a prewritten script about how you want your shifted reality experience to pan out prepared, recite it. Move selected team down. Command+Shift+Down arrow key. Open the History menu. Go to the previous section. Go to the next section.

In this tutorial, I'll walk through an example of using the shift() pandas series method for analyzing bike rides.

In this example, I assume a service in which I lend out a single bike for people to ride throughout the day in San Francisco. Each record in the dataset is the start and end time for a ride.

The code below creates a list of start and end times of rides.

Create a pandas dataframe given the lists ride_start_times and ride_end_times.

ride_start_timeride_end_time
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347
12019-04-21 22:43:29.7113472019-04-21 22:51:29.711347
22019-04-21 23:07:29.7113472019-04-21 23:23:29.711347
32019-04-22 01:00:29.7113472019-04-22 01:19:29.711347
42019-04-22 02:14:29.7113472019-04-22 02:20:29.711347
52019-04-22 03:45:29.7113472019-04-22 03:55:29.711347

Find the Mean Duration of Time, in Minutes, Bike is Left Idle Between Rides¶

For example, the first ride ended around 4:24PM and was next used for a ride at 5:52PM. I want to calculate a new column that states there was approximately 86 minutes, equivalent to approximately 5160 seconds, of idle time between these rides.

I want to do a column-by-column comparison. I use the shift() method to create a new column in df_bike_rides that's a shift of value in ride_end_time down one period.

ride_start_timeride_end_timeprevious_ride_end_time
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347NaT
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472019-04-21 21:41:29.711347
22019-04-21 23:07:29.7113472019-04-21 23:23:29.7113472019-04-21 22:51:29.711347
32019-04-22 01:00:29.7113472019-04-22 01:19:29.7113472019-04-21 23:23:29.711347
42019-04-22 02:14:29.7113472019-04-22 02:20:29.7113472019-04-22 01:19:29.711347
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472019-04-22 02:20:29.711347

The shift() method for a pandas series is similar to a window function in SQL using LAG() and LEAD(). The same operation above would look like the following in SQL:

Given this new column for previous_ride_end_time, I can subtract the time between a new bike ride's start time and the previous ride's end time. The result is the duration the bike was idle between rides.

ride_start_timeride_end_timeprevious_ride_end_timeduration_bike_idle_between_rides
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347NaTNaT
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472019-04-21 21:41:29.71134701:02:00
22019-04-21 23:07:29.7113472019-04-21 23:23:29.7113472019-04-21 22:51:29.71134700:16:00
32019-04-22 01:00:29.7113472019-04-22 01:19:29.7113472019-04-21 23:23:29.71134701:37:00
42019-04-22 02:14:29.7113472019-04-22 02:20:29.7113472019-04-22 01:19:29.71134700:55:00
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472019-04-22 02:20:29.71134701:25:00

The new column duration_bike_idle_between_rides shows the duration of idle bike time between rides in the format HH-MM-SS. The value of 01:02:00 is equivalent to saying 1 hour and 2 minutes. Below, I convert that timedelta format into a single numerical value of minutes. I utilize the dt accessor and total_seconds()method to calculate the total seconds a bike is idle between rides. Then I divide this value by 60 to get a value in minutes.

View df_bike_rides below with a new column for the minutes_bike_idle_between_rides.

ride_start_timeride_end_timeprevious_ride_end_timeduration_bike_idle_between_ridesminutes_bike_idle_between_rides
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347NaTNaTNaN
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472019-04-21 21:41:29.71134701:02:0062.0
22019-04-21 23:07:29.7113472019-04-21 23:23:29.7113472019-04-21 22:51:29.71134700:16:0016.0
32019-04-22 01:00:29.7113472019-04-22 01:19:29.7113472019-04-21 23:23:29.71134701:37:0097.0
42019-04-22 02:14:29.7113472019-04-22 02:20:29.7113472019-04-22 01:19:29.71134700:55:0055.0
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472019-04-22 02:20:29.71134701:25:0085.0

I calculate the mean minutes_bike_idle_between_rides value as 63 minutes.

Example 2: Duration Idle Time Between Bike Rides Per Unique Bike¶

This example below is similar to the one above. However, I assume I now operate a fleet of 2 bikes and rent them out for people to ride to specific stations in the city of San Francisco.

Below I create a pandas dataframe with details on bike ride times, the bike id and the start and end station.

ride_start_timeride_end_timebike_idstart_stationend_station
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347121st & Folsom4th & King
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472221st & Folsom4th & King
22019-04-21 23:07:29.7113472019-04-21 23:23:29.71134714th & King24th & Valencia
32019-04-22 01:00:29.7113472019-04-22 01:19:29.711347124th & ValenciaEmbarcadero & Market
42019-04-22 02:14:29.7113472019-04-22 02:20:29.711347224th & King16th and Mission
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472216th and Mission4th & King

I sort df_bike_sharing first by the bike_id column and then the ride_start_time column.

ride_start_timeride_end_timebike_idstart_stationend_station
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347121st & Folsom4th & King
22019-04-21 23:07:29.7113472019-04-21 23:23:29.71134714th & King24th & Valencia
32019-04-22 01:00:29.7113472019-04-22 01:19:29.711347124th & ValenciaEmbarcadero & Market
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472221st & Folsom4th & King
42019-04-22 02:14:29.7113472019-04-22 02:20:29.711347224th & King16th and Mission
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472216th and Mission4th & King
Shift downtown flint

For the bike_id column, I shift down values by 1 to create a new column called previous_bike_id. I do this so I can easily compare a bike id to the previous ride's ID to identify the last ride by a bike id for a day.

Similar to LAG()

ride_start_timeride_end_timebike_idstart_stationend_stationprevious_bike_id
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347121st & Folsom4th & KingNaN
22019-04-21 23:07:29.7113472019-04-21 23:23:29.71134714th & King24th & Valencia1.0
32019-04-22 01:00:29.7113472019-04-22 01:19:29.711347124th & ValenciaEmbarcadero & Market1.0
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472221st & Folsom4th & King1.0
42019-04-22 02:14:29.7113472019-04-22 02:20:29.711347224th & King16th and Mission22.0
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472216th and Mission4th & King22.0

In SQL, this operation would be:

For the ride_end_time column, I shift down values by 1 to create a new column called previous_ride_end_time. I do this so I can later find the idle bike time between rides.

Shift Downtime

ride_start_timeride_end_timebike_idstart_stationend_stationprevious_bike_idprevious_ride_end_time
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347121st & Folsom4th & KingNaNNaT
22019-04-21 23:07:29.7113472019-04-21 23:23:29.71134714th & King24th & Valencia1.02019-04-21 21:41:29.711347
32019-04-22 01:00:29.7113472019-04-22 01:19:29.711347124th & ValenciaEmbarcadero & Market1.02019-04-21 23:23:29.711347
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472221st & Folsom4th & King1.02019-04-22 01:19:29.711347
42019-04-22 02:14:29.7113472019-04-22 02:20:29.711347224th & King16th and Mission22.02019-04-21 22:51:29.711347
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472216th and Mission4th & King22.02019-04-22 02:20:29.711347

We only want to calculate a duration the bike was idle in a row if it's a comparison for the same bike id. Below I show the code to replace all values in a row with NaN or NaT if the condition is false.

ride_start_timeride_end_timebike_idstart_stationend_stationprevious_bike_idprevious_ride_end_time
0NaTNaTNaNNaNNaNNaNNaT
22019-04-21 23:07:29.7113472019-04-21 23:23:29.7113471.04th & King24th & Valencia1.02019-04-21 21:41:29.711347
32019-04-22 01:00:29.7113472019-04-22 01:19:29.7113471.024th & ValenciaEmbarcadero & Market1.02019-04-21 23:23:29.711347
1NaTNaTNaNNaNNaNNaNNaT
42019-04-22 02:14:29.7113472019-04-22 02:20:29.71134722.04th & King16th and Mission22.02019-04-21 22:51:29.711347
52019-04-22 03:45:29.7113472019-04-22 03:55:29.71134722.016th and Mission4th & King22.02019-04-22 02:20:29.711347

With the above change, I calculate ride_start_time minus previous_ride_end_time where the above condition holds True. With that calculation, I create a new column called duration_bike_idle.

ride_start_timeride_end_timebike_idstart_stationend_stationduration_bike_idle
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347121st & Folsom4th & KingNaT
22019-04-21 23:07:29.7113472019-04-21 23:23:29.71134714th & King24th & Valencia01:26:00
32019-04-22 01:00:29.7113472019-04-22 01:19:29.711347124th & ValenciaEmbarcadero & Market01:37:00
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472221st & Folsom4th & KingNaT
42019-04-22 02:14:29.7113472019-04-22 02:20:29.711347224th & King16th and Mission03:23:00
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472216th and Mission4th & King01:25:00

I drop previous_bike_id and previous_ride_end_time since they were intermediary outputs used for calculations. They're not necessary for a final presentation of the critical details.

The new column duration_bike_idle_seconds shows the duration of idle bike time between rides in the format HH-MM-SS. The value of 01:26:00 is equivalent to saying 1 hour and 26 minutes. Below, I convert that timedelta format into a single numerical value of minutes. I utilize the dt accessor and total_seconds()method to calculate the total seconds a bike is idle between rides of the same bike id. Then I divide this value by 60 to get a value in minutes.

ride_start_timeride_end_timebike_idstart_stationend_stationduration_bike_idleduration_bike_idle_seconds
02019-04-21 21:23:29.7113472019-04-21 21:41:29.711347121st & Folsom4th & KingNaTNaN
22019-04-21 23:07:29.7113472019-04-21 23:23:29.71134714th & King24th & Valencia01:26:005160.0
32019-04-22 01:00:29.7113472019-04-22 01:19:29.711347124th & ValenciaEmbarcadero & Market01:37:005820.0
12019-04-21 22:43:29.7113472019-04-21 22:51:29.7113472221st & Folsom4th & KingNaTNaN
42019-04-22 02:14:29.7113472019-04-22 02:20:29.711347224th & King16th and Mission03:23:0012180.0
52019-04-22 03:45:29.7113472019-04-22 03:55:29.7113472216th and Mission4th & King01:25:005100.0

Here is the average seconds each bike_id is idle during the day in seconds between the first and last ride. I group by the bike_id column and calculate the mean of the duration_bike_idle_seconds_time column. I reset the index and rename the columns so this final output is easier to understand.

bike_idavg_seconds_idle_between_rides
015490.0
1228640.0

Bike id of 22 was left idle longer between rides than the bike id of 1.

Some functions (like Sine and Cosine) repeat forever
and are called Periodic Functions.

Shift Download

The Period goes from one peak to the next (or from any point to the next matching point):

The Amplitude is the height from the center line to the peak (or to the trough). Or we can measure the height from highest to lowest points and divide that by 2.

The Phase Shift is how far the function is shifted horizontally from the usual position.

The Vertical Shift is how far the function is shifted vertically from the usual position.

All Together Now!

We can have all of them in one equation:

y = A sin(B(x + C)) + D

  • amplitude is A
  • period is 2π/B
  • phase shift is C (positive is to the left)
  • vertical shift is D

And here is how it looks on a graph:


Note that we are using radians here, not degrees, and there are 2π radians in a full rotation.

Example: sin(x)

This is the basic unchanged sine formula. A = 1, B = 1, C = 0 and D = 0

So amplitude is 1, period is 2π, there is no phase shift or vertical shift:

Example: 2 sin(4(x − 0.5)) + 3

  • amplitude A = 2
  • period 2π/B = 2π/4 = π/2
  • phase shift = −0.5 (or 0.5 to the right)
  • vertical shift D = 3
Shift down words

In words:

  • the 2 tells us it will be 2 times taller than usual, so Amplitude = 2
  • the usual period is 2π, but in our case that is 'sped up' (made shorter) by the 4 in 4x, so Period = π/2
  • and the −0.5 means it will be shifted to the right by 0.5
  • lastly the +3 tells us the center line is y = +3, so Vertical Shift = 3
Shift Down

Instead of x we can have t (for time) or maybe other variables:

Example: 3 sin(100t + 1)

First we need brackets around the (t+1), so we can start by dividing the 1 by 100:

3 sin(100t + 1) = 3 sin(100(t + 0.01))

Now we can see:

  • amplitude is A = 3
  • period is 2π/100 = 0.02 π
  • phase shift is C =0.01 (to the left)
  • vertical shift is D = 0

And we get:

Frequency

Frequency is how often something happens per unit of time (per '1').

Example: Here the sine function repeats 4 times between 0 and 1:

So the Frequency is 4

And the Period is 14

In fact the Period and Frequency are related:

Frequency = 1Period

Period = 1Frequency

Shift Down Words

Example from before: 3 sin(100(t + 0.01))

Excel Ctrl Shift Down

Down

The period is 0.02π

So the Frequency is 10.02π = 50π

Some more examples:

When frequency is per second it is called 'Hertz'.

Example: 50 Hertz means 50 times per second


The faster it bounces the more it 'Hertz'!

Animation