Saturday, January 19, 2013
Book Recommendation: When Genius Failed
When Genius Failed: The Rise and Fall of Long-Term Capital Management
by Roger Lowenstein
This is an excellent read. It is interesting that such brilliant people and genius created the financial crisis in 1998. Greed drove them to corruption and failure. Its great for us to learn from this book. If you are participating in the financial markets, you must have great mentality and risk management skills.
A Simple Analysis of WTI Crude Oil Spot Price
I am going to use some statistical tools to analyze the spot price data of West Texas Intermediate (WTI) Crude Oil. First, the time frame is from year 1986 to year 2012. The trace plot of the daily price series and the daily return (continuously compounded return) is given below.
1. A Trace Plot:
First, we would plot the histograms, normal QQ-plots and boxplots for the two series. From the diagram below, the price series is not Gaussian-like and positively skewed. There are many outliars from the right side (larger side). For the return series, it is more Gaussian-like (much more like a bell-shape distribution) but with a fatter tail. We can see there are many outliars from both the right and left side.
3. Autocorrelation Plots:
At the third step, I want to investigate the serial dependence of the two series. For the price series, the sample autocorrelation coefficients are significantly large. It showed that the price series exhibit serial dependence and it is distributed as a Gaussian white noise. For the return series, the autocorrelation coefficients are quite small. I would conclude that the series should be independent with one another. The series is distributed like a Gaussian white noise but essentially it is not due to fatter tail. A t-distribution may be more appropriate.
As we can see the return series probably exhibits volatility clustering, I want to use acf plot to see whether it is the case. I used absolute values of the returns and squared returns to construct the acf plots. The plots showed that autocorrelation coefficients are significant, meaning that a big move is usually followed by a big move, and vice versa. Therefore, there is volatility clustering in the return series. There is a need to model the volatility.
1. A Trace Plot:
Oil price can be affected by different kinds of factors. The key factors are demand, supply, US dollar, and geopolitics. For example, weak US dollar will make the price of oil comparatively cheaper in imported countries, causing an increase in demand of WTI crude oil. Thus, it bids up the oil prices. It is definitely the case when the U.S. Federal Reserve committed loose monetary policy. An other example for geopolitics, when there were wars / conflicts (e.g. Gulf War in 1990 - 1991, and Libyan Revolution in 2011), these will make the supplies of oil become unstable, raising the oil prices. Those are reasons why the WTI spot prices exhibited some sudden jumps during the period.
From the above trace plot, the daily returns have a constant mean of about zero. However, the daily returns exhibit volatility clustering, which means a high volatility seems to be followed by a high volatility. In the sense of time series analysis, the return series is stationary in mean but non-stationary in variance.
2. Descriptive Statistics and Distribution Plots:
At the next step, I will give out some graphs and descriptive statistics for analyzing the distributions of the two series.
First, we would plot the histograms, normal QQ-plots and boxplots for the two series. From the diagram below, the price series is not Gaussian-like and positively skewed. There are many outliars from the right side (larger side). For the return series, it is more Gaussian-like (much more like a bell-shape distribution) but with a fatter tail. We can see there are many outliars from both the right and left side.
From the descriptive statistics, for the price series, the mean is 38, however, the price in this few years never met this value, meaning that the series exhibit non-stationarity in mean or an trend. There is also positive skewness and a slight excess kurtosis.
For the return series, I would say it is much more like a Gaussian white noise (the return series seem to cross over the mean (0.0002) more frequently). But it is slightly negatively skewed with a very fat tail (excess kurtosis is a bit large).
Descriptive Statistics:
Price Return
Observations 6812 6811
Minimum 10.2500 -0.4064
Quartile 1 18.8175 -0.0121
Median 24.1500 0.0007
Arithmetic Mean 38.7353 0.0002
Geometric Mean 31.0842 -0.0001
Quartile 3 58.2725 0.0133
Maximum 145.3100 0.1915
SE Mean 0.3447 0.0003
LCL Mean (0.95) 38.0596 -0.0004
UCL Mean (0.95) 39.4110 0.0008
Variance 809.4131 0.0007
Stdev 28.4502 0.0257
Skewness 1.2610 -0.7566
Kurtosis 0.4732 14.5647
3. Autocorrelation Plots:
At the third step, I want to investigate the serial dependence of the two series. For the price series, the sample autocorrelation coefficients are significantly large. It showed that the price series exhibit serial dependence and it is distributed as a Gaussian white noise. For the return series, the autocorrelation coefficients are quite small. I would conclude that the series should be independent with one another. The series is distributed like a Gaussian white noise but essentially it is not due to fatter tail. A t-distribution may be more appropriate.
4. Autocorrelation Plots concerning the Volatility:
As we can see the return series probably exhibits volatility clustering, I want to use acf plot to see whether it is the case. I used absolute values of the returns and squared returns to construct the acf plots. The plots showed that autocorrelation coefficients are significant, meaning that a big move is usually followed by a big move, and vice versa. Therefore, there is volatility clustering in the return series. There is a need to model the volatility.
(Data source: U.S. Energy Information Administration)
***In the coming future, I will use statistical tools to model / forecast / explain the WTI oil spot prices.
Thursday, January 17, 2013
Data Scientist: The Sexiest Job of the 21st Century
by Thomas H. Davenport and D.J. Patil
When Jonathan Goldman arrived for work in June 2006 at LinkedIn, the business networking site, the place still felt like a start-up. The company had just under 8 million accounts, and the number was growing quickly as existing members invited their friends and colleagues to join. But users weren’t seeking out connections with the people who were already on the site at the rate executives had expected. Something was apparently missing in the social experience. As one LinkedIn manager put it, “It was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.”
Goldman, a PhD in physics from Stanford, was intrigued by the linking he did see going on and by the richness of the user profiles. It all made for messy data and unwieldy analysis, but as he began exploring people’s connections, he started to see possibilities. He began forming theories, testing hunches, and finding patterns that allowed him to predict whose networks a given profile would land in. He could imagine that new features capitalizing on the heuristics he was developing might provide value to users. But LinkedIn’s engineering team, caught up in the challenges of scaling up the site, seemed uninterested. Some colleagues were openly dismissive of Goldman’s ideas. Why would users need LinkedIn to figure out their networks for them? The site already had an address book importer that could pull in all a member’s connections.
Luckily, Reid Hoffman, LinkedIn’s cofounder and CEO at the time (now its executive chairman), had faith in the power of analytics because of his experiences at PayPal, and he had granted Goldman a high degree of autonomy. For one thing, he had given Goldman a way to circumvent the traditional product release cycle by publishing small modules in the form of ads on the site’s most popular pages.
Through one such module, Goldman started to test what would happen if you presented users with names of people they hadn’t yet connected with but seemed likely to know—for example, people who had shared their tenures at schools and workplaces. He did this by ginning up a custom ad that displayed the three best new matches for each user based on the background entered in his or her LinkedIn profile. Within days it was obvious that something remarkable was taking place. The click-through rate on those ads was the highest ever seen. Goldman continued to refine how the suggestions were generated, incorporating networking ideas such as “triangle closing”—the notion that if you know Larry and Sue, there’s a good chance that Larry and Sue know each other. Goldman and his team also got the action required to respond to a suggestion down to one click.
It didn’t take long for LinkedIn’s top managers to recognize a good idea and make it a standard feature. That’s when things really took off. “People You May Know” ads achieved a click-through rate 30% higher than the rate obtained by other prompts to visit more pages on the site. They generated millions of new page views. Thanks to this one feature, LinkedIn’s growth trajectory shifted significantly upward.
A New Breed
Goldman is a good example of a new key player in organizations: the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.
Much of the current enthusiasm for big data focuses on technologies that make taming it possible, including Hadoop (the most widely used framework for distributed file system processing) and related open-source tools, cloud computing, and data visualization. While those are important breakthroughs, at least as important are the people with the skill set (and the mind-set) to put them to good use. On this front, demand has raced ahead of supply. Indeed, the shortage of data scientists is becoming a serious constraint in some sectors. Greylock Partners, an early-stage venture firm that has backed companies such as Facebook, LinkedIn, Palo Alto Networks, and Workday, is worried enough about the tight labor pool that it has built its own specialized recruiting team to channel talent to businesses in its portfolio. “Once they have data,” says Dan Portillo, who leads that team, “they really need people who can manage it and find insights in it.”
Who Are These People?
If capitalizing on big data depends on hiring scarce data scientists, then the challenge for managers is to learn how to identify that talent, attract it to an enterprise, and make it productive. None of those tasks is as straightforward as it is with other, established organizational roles. Start with the fact that there are no university programs offering degrees in data science. There is also little consensus on where the role fits in an organization, how data scientists can add the most value, and how their performance should be measured.
Source: Data Scientist: The Sexiest Job of the 21st Century by Thomas H. Davenport and D.J. Patil
I do agree with the authors. In the age of big data, we need specialists to capitalize the data. But what a data scientist needs are from wide range of scopes, such as statistical analysis, machine learning skills, text mining, information visualization, and advanced computing. For me, it is difficult to acquire some of the knowledge. However, those really add value to the world.
Tuesday, January 15, 2013
A Different Way to Tell the Story of Fiscal Cliff in U.S.
This is a funny demonstration of the situation. I am optimistic that the two parties will come up with a solution. But there are still lots of uncertainty in the market, e.g. the euro credit crisis and China-Japan conflict.
Subscribe to:
Comments (Atom)

