Project Background, Summary & Scope of Analysis:
This project explores transaction data from the Monetary Policy known as "Quantitative Easing" (QE). The time period of the data spans mid-2010 to near year-end 2014. This time period was one of significant monetary expansion from the Federal Reserve (Fed) through direct purchases of U.S. Treasuries and other instruments. This endeavor was to lower interest rates at different maturity points on the yield curve. This policy was implemented day to day through open market transactions conducted by the New York branch of the Federal Reserve (NYFed).
The data for exploration and analysis is the transaction data of U.S. Treasuries purchased and sold by the NYFed with Primary Dealer counterparties. Primary Dealers are the commercial banks authorized and obligated to participate in these exchanges with the NYFed. The exploration and analysis also utilized the Treasury Yield Curve data for Yield Curve observations at the same points in time as the transactions. The Yield Curve data was retreived from Quandl API.
These two datasets were joined together for analysis and modeling. Exploration was conducted to identify patterns in the transactions of Treasuries be specific Primary Dealer counterparties. For modeling, three neural nets with different architectures were used as well as a Vector AutoRegressive (VAR) Model. The targets selected were the 3-month and the 30-year interest rates. These targets are continuous and the type of Machine Learning these models were fit for was therefore regression. Predictions for each model were evaluated and compared visually on a time series plot, as by Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). The models were also compared in two groups by their performance on the two different targets.
Directory & File Structure:
README.md - Description of Project procedure and analysis
code: Three sequential notebooks for processing, EDA, and modeling
Please run notebooks in the numerical order of their labels
data: - Two directories:
csv_exports - Results of data pre-processing for purposes of EDA and modeling
Treasuries - Native Excel files that constitute original data that was analyzed. Each file groups transaction records by quarter. These files were retrieved from The New York Federal Reserve website page for historical transaction data
documentation: Presentation slides and Python environment requirements
images: Collection of third-party and project generated png files for visualization
The following describes all features used for analysis after preprocessing prior to the encoding of the feature "Counterparty."
Other features native to the original data that were dropped are not included.
These features are clearly presented in the edaFINAL.csv found in the data directory.
The native Excel files were manually downloaded from the NYFed web page. Once these files were organized in a directory, a new pandas DataFrame was created to chronologically concatenate their quarterly data. Standard cleaning was applied to features. Interest rate Yield Curve data was retreived from a Quandl API call and concatenated with the transaction DataFrame. New features were engineered from the Security description column. Columns with date values were converted to Pandas DateTime objects. The index was changed to the DateTime index from the "Trade date" column. The shape of the dataset amounted to 89,908 rows and 28 columns (before categorical encoding).
An initial challenge was that many rows of transactions were indexed on the same day with no further time division information to distinguish them. This created many duplicate values for the DateTime columns and the index. The solution to this problem was found by interpolating equal intervals of time for each row. Trial and error found the time interval that kept every original row in the correct chronological order, around 27 minutes. Interpolating was performed using traces, a third-party Python library.
One DataFrame, exclusively for Exploratory Data Analysis, was exported as a CSV file. Processing after this was in preparation for modeling and involved One Hot Encoding of categorical features. The features that were encoded were "Transaction category," "Type," and "Counterparty." The DataFrame for modeling was separately exported as a CSV file as well.
Exploratory Data Analysis:
The different Primary Dealers (counterparties) were the focus of the data exploration. Specific attention was paid to the counterparties that participated in the most transactions with the NYFed during the time period of the dataset.