Financial Modeling Prep (FMP) is a new service that offers a range of financial metrics through an API. Use fmpcloudr to access the data and analyze it in R.
Financial Modeling Prep and the sister site FMP Cloud provides access to a range of financial data points. This article will use the fmpcloudr package to pull data from the API into R. The analysis will look at historical returns, trade volume, and the composition of the S&P 500. One thing to note, because FMP is still relatively new, some of the historical data points are unavailable. This will be highlighted within the analysis.
First, set up the R environment:
# Load the necessary libraries
library(fmpcloudr)
library(dplyr)
library(lubridate)
library(plotly)
# Set options and create if.na function
options(scipen=999)
options(dplyr.width = Inf)
if.na <- function(x,y)(ifelse(is.na(x),y,x))
# Set the FMP token - stored in a local location on server
fmpKey = readRDS('/Users/tonytrevisan/Documents/Server/Secure/fmp.rds')
fmpcloudr::fmpc_set_token(fmpKey, noBulkWarn = TRUE)
FMP provides access to index data which is not the case for many of the free and inexpensive pricing APIs available. For many of the Total Return indexes data only goes back to April of 2020, but the Price Return indexes have a much deeper history. Note: Total Return includes dividend reinvestment.
First we will pull index data for the three major indexes: S&P
500, Nasdaq, and Dow Jones combined with Gold. We can search for these
symbols using fmpc_symbols_by_market
.
# Search for available symbols for indexes and commodities
AvailIndx = fmpc_symbols_by_market(c('index','commodity'))
# Filter for Gold, S&P, Dow and Nasdaq
AvailIndx %>%
mutate(lowerName = tolower(name)) %>%
filter(grepl('gold',lowerName) | grepl('s&p 500',lowerName) |
grepl('dow jones ind',lowerName) | grepl('nasdaq',lowerName)) %>%
select(1,2) -> symbName
knitr::kable(symbName)
symbol | name |
---|---|
ZGUSD | Gold |
GCUSD | Gold |
YMUSD | Mini Dow Jones Indus.-$5 |
ESUSD | E-Mini S&P 500 |
NQUSD | Nasdaq 100 |
MGCUSD | Micro Gold Futures |
^SP500TR | S&P 500 (TR) |
^XNDX | NASDAQ 100 Total Return Index |
^NDX | NASDAQ 100 |
^VXN | CBOE NASDAQ 100 Volatility |
^IXIC | NASDAQ Composite |
^DJITR | Dow Jones Industrial Average TR |
^GSPC | S&P 500 |
^GVZ | CBOE Gold Volatitity Index |
^SP500-30 | S&P 500 Consumer Staples (Sector) |
^SP500-40 | S&P 500 Financials (Sector) |
^SP500-45 | S&P 500 Information Technology (Sector) |
^SP500-25 | S&P 500 Consumer Discretionary (Sector) |
^DJI | Dow Jones Industrial Average |
^XSP | S&P 500 MINI SPX OPTIONS INDEX |
Once we have the symbols we can use fmpc_price_history
to pull the price history for each index. We will pull the Price Return
index due to the longer history. Luckily the S&P total return is
also available.
# Pull data for price return indexes, Gold, and S&P TR back to 2000
IndxHist = fmpc_price_history(c('^NDX','^DJT','^GSPC','^SP500TR','GCUSD'),
startDate = '2000-01-01')
# Calculate the cumulative return of each symbol
IndxHist %>%
# Bring in Naming data
left_join(select(AvailIndx,symbol,name), by = 'symbol') %>%
group_by(symbol) %>%
# Calculate the cumulative return for each symbol
mutate(return = if.na(log(adjClose/lag(adjClose)),0), # Calculate log returns for each index
name = ifelse(symbol == 'GCUSD','Gold',
ifelse(symbol == '^DJT','Dow Jones (PR)',
ifelse(symbol == '^NDX','Nasdaq 100 (PR)',name))),
cumReturn = exp(cumsum(return))-1) -> IndxReturn
Finally, we can plot the data to see how the cumulative return looks overtime.
# use plotly to create interactive plot
plot_ly(IndxReturn) %>%
add_trace(x = ~date, y = ~cumReturn, color = ~name, type = 'scatter', mode = 'lines') %>%
layout(title = 'Major Index Performance since Jan 2000',
hovermode = 'compare',
legend = list(orientation = "h", xanchor = "center", x = 0.5),
xaxis = list(title = ''),
yaxis = list(title = 'Return', tickformat = '.0%'))
As can be seen above, gold has performed very well since 2000 despite a massive bear market between 2011-2015. The other averages have all moved in line with each other. The reason the Nasdaq looks to be underperforming over this time period is because of the huge draw down during the tech crash in the early 2000s.
As mentioned above and in a previous article, the price return and total return differ due to the reinvestment of dividends. While dividends typically range between 1% and 2%, the compounding effect can be quite dramatic.
In this analysis we will compare the price return index to the total return performance of the ETFs. As a point of comparison, I have also included the S&P total return index to see how SPY performs against the index. The SPY return will be slightly below the index due to ETF fees.
# Pull the ETF data for S&P and Nasdaq - QQQ and SPY
ETFHist = fmpc_price_history(c('QQQ','SPY'), startDate = '2000-01-01')
# Calculate the returns
ETFHist %>%
group_by(symbol) %>%
mutate(return = if.na(log(adjClose/lag(adjClose)),0),
cumReturn = exp(cumsum(return))-1) %>%
filter(date == max(date)) -> ETFReturn
# Pull the Index data from the analysis above
NasSP = IndxReturn %>%
filter(symbol %in% c('^GSPC','^NDX','^SP500TR')) %>%
filter(date == max(date))
# Create a name map
map = tibble(symbol = c('^GSPC','^NDX','^SP500TR','QQQ','SPY'),
plotname = c('SP 500 PR','Nasdaq PR','SP 500 TR','Nasdaq ETF TR','SP 500 ETF TR'))
# Stack the data and take the max date which has the cumulative return for the period
rbind(NasSP,ETFReturn) %>%
filter(date == max(date)) %>%
left_join(map, by = 'symbol') -> PlotTRPR
Once the returns have been calculated, we can plot the data using a bar chart to show the cumulative return over the time period. The tech heavy QQQ is not known for high dividends and this can be seen in the chart below. The S&P TR vs PR difference is much greater than the same difference for Nasdaq (108% differnce versus only 34% for the Nasdaq).
# Plot the ETF Data in a bar chart
plot_ly(PlotTRPR) %>%
add_trace(x = ~plotname, y = ~cumReturn, color = ~plotname, type = 'bar') %>%
layout(title = 'Price Return vs Total Return',
hovermode = 'compare',
showlegend = FALSE,
xaxis = list(title = ''),
yaxis = list(title = 'Return', tickformat = '.0%'))
ETFs have become extremely popular over the past twenty years. We can use trading on the S&P 500 as a proxy to see the increase in activity. Daily trade volume is a great indication along with total assets. There are three main ETFs that track the S&P 500: SPY, IVV, and VOO. We can analyze the trade volume on all three to see how trade volume has trended over time. Below we will pull the data and plot it into a stacked bar chart.
# Pull the three tickers back to SPY inception
SPETF = fmpc_price_history(c('IVV','VOO','SPY'), startDate = '1993-01-01')
# Calculate notional volume, and take daily average for each quarter
SPETF %>%
mutate(notionVol = adjClose * unadjustedVolume) %>% # Notional amount is price * share volume
mutate(qtrEnd = ceiling_date(date, unit = 'quarter')-1) %>% # Calculate qtr end date
group_by(qtrEnd, symbol) %>% # Group qtr/symbol to calcualte average
summarise(avgDlyvol = mean(notionVol)) %>%
arrange(qtrEnd) -> ETFVol
# Plot the data in a stacked bar chart
plot_ly(ETFVol) %>%
add_trace(x = ~qtrEnd, y = ~avgDlyvol, color = ~symbol, type = 'bar') %>%
layout(title = 'Average Daily Volume',
hovermode = 'compare',
barmode = 'stack',
legend = list(orientation = "h", xanchor = "center", x = 0.5),
xaxis = list(title = ''),
yaxis = list(title = 'Notional Daily Volume'))
As can be seen above, trade volume has increased dramatically in S&P ETFs since the inception of the first ETF (SPY) back in 1993. One glaring takeaway is how much SPY dominates the trade volume. Ironically, this is not due entirely to size. In the pull below, we can see that even though SPY has the largest market cap (almost 50% of the total), it’s trade volume dominates in comparison garnering over 90% of the trade volume. SPY has always been the most popular ETF and is used by many intuitions due to its liquidity.
# Proiles show the current market cap
knitr::kable(fmpc_security_profile(c('SPY','IVV','VOO')) %>%
mutate(mktCap = format(mktCap, big.mark = ','),
volAvg = format(volAvg, big.mark = ',')) %>%
select(symbol, volAvg, mktCap, companyName), "simple")
symbol | volAvg | mktCap | companyName |
---|---|---|---|
SPY | 79,869,183 | 407,302,477,774 | SPDR S&P 500 ETF Trust |
IVV | 3,757,808 | 277,428,862,586 | iShares Core S&P 500 ETF |
VOO | 3,576,846 | 0 | Vanguard 500 Index Fund |
The S&P 500 index was formed in 1957 and has changed dramatically over that time. Many companies have been added and removed based on certain selection criteria. Recent articles have highlighted how the concentration in the top 5 companies has never been greater. This analysis will attempt to replicate those findings using FMP.
Reconstructing the S&P 500 historically is a bit tricky, but FMP offers API endpoints that allow you to reconstruct the index historically and chart the concentration.
As with any data exercise, many times the data is not perfect. Unfortunately that is the case here, so the data must be cleaned and modified. The FMP support stuff is extremely responsive and very helpful. They have been great at correcting any data issues I flag to their attention. This is a bit low on the priority list, so I have not raised the issues herein, but I also think the data is good enough for exploration. Additionally, it could be challenging to pull historical market cap data for delisted companies.
The goal of this section will be to analyze the top 5 companies in the S&P by doing the following:
plotly
to create an overlay of the S&P 500
overtime# Pull the current list of the S&P 500
spcur = fmpc_symbols_index(period = 'current')
# Pull the historical list of companies that have been added and removed
sphist = fmpc_symbols_index(period = 'historical', index = 'sp500') %>%
mutate(date = as.Date(date))
The data goes back to 1994, but as mentioned above, the data is not perfectly clean and consistent. For example, Morgan Stanley appears in the current S&P 500 list, but also shows as having been removed in the historical S&P 500 list back in 1997 without ever being added again.
symbol | name | sector | subSector | headQuarter | dateFirstAdded | cik | founded |
---|---|---|---|---|---|---|---|
MS | Morgan Stanley | Financials | Investment Banking & Brokerage | New York City, New York | 1993-07-29 | 0000895421 | 1935 |
dateAdded | addedSecurity | removedTicker | removedSecurity | date | symbol | reason |
---|
I don’t want to get too caught up with all the data issues, so instead I will make notes about them in the comments.
To simplify the output, I only want to show a snapshot of the S&P at the end of each year. The data set starts in 1994. Thus, I will take the S&P 500 as of 1994 and loop through each year up to the current, adding and removing the tickers as indicated by the data set. Because of the issue highlighted above with Morgan Stanley, I will also force add any ticker currently in the S&P 500 that has an add date on or before the year in the loop.
Note: I recognize that loops are frowned upon in R, better to use a function within map or apply, but I thought this would be easier to read.
# Filter for the first year of 1993
sp500yr = sphist %>%
filter(year(date)==1994) %>%
mutate(date = as.Date('1994-12-31')) %>%
select(symbol,date)
# Make this the "previous year" to start the loop
prevYear = sp500yr
# Create a vector of years to Loop through
yrloop = 1995:year(Sys.Date()-20)
# Loop through each year from 1995 - 2020
for (i in yrloop) {
# Identify securities that have been removed
rmvd = sphist %>%
filter(year(date)==i, removedSecurity != '') %>%
pull(symbol)
# Identify securities that have been added
added = sphist %>%
filter(year(date)==i, addedSecurity != '') %>%
select(symbol) %>%