When Forecasts Fail: Unpredictability in Israeli-Palestinian Interaction

This article explores the paradox that forecasts may be most likely to fail during dramatic moments of historic change that social scientists are most eager to predict. It distinguishes among four types of shocks that can undermine the predictive power of time series analyses: effect shocks that change the size of the causal effect; input shocks that change the causal variables; duration shocks that change how long a causal effect lasts; and actor shocks that change the number of agents in the system. The significance of these shocks is illustrated in Israeli–Palestinian interactions, one of the contemporary world’s most intensely scrutinized episodes, using vector autogression analyses of more than 15,000 Reuters news stories over the past three decades. The intervention of these shocks raises the prospect that some historic episodes may be unpredictable, even retrospectively.

F or millennia, experts have promised special insight into the future.This ambition was one of the original visions for the social sciences in the nineteenth century-"to enquire into the present, in order to foresee the future, and to discover the means of improving it" (Comte [1851] 1875:15)-and prediction remained one of the "principal tasks" of many social scientists in the mid-twentieth century (Schuessler 1968:418-19).In recent decades, this ambition has retreated somewhat, as "prediction has become almost a taboo word, connoting an embarrassing affiliation to vulgar positivism, scientism and technocracy" (Aldridge 1999).Still, prediction continues to flourish in a variety of social sciences, especially the applied wings of criminology, econometrics, and international relations, where government and business planning is heavily invested in forecasting (Land and Schneider 1987;Cooper and Layard 2002;Schneider, Gleditsch, and Carey 2010).Over the past generation, forecasters have established institutional venues for professional development, such as the International Institute of Forecasters, as well as specialized academic journals and distinctive statisical methods. 1 1 Some scholars distinguish between prediction and forecasting, though there is no uniform usage of either One of the greatest challenges in forecasting is to develop models that anticipate historical discontinuities-sudden turns of fortune such as economic crashes, civil conflict, and revolution (Moore 1964;Sornette 2002;Bueno de Mesquita 2009;Goldstone et al. 2010).The more dramatic and counterintuitive the outcome, the more rewarding it is to "endogenize" the factors and dynamics that led to it.This is part of the appeal of "chaos theory," whose models generate irregular trajectories based on complex processes of closed systems with fixed parameters (Smith 1998;Kellert 2008).Yet history is rarely a closed system-agents may enter and leave, preferences and priorities may be overturned, or the dynamics of interaction may change.These shifts have been given a variety of labels, including black swans (Talib 2007), contingency (Shapiro and Bedi 2007;York and Clark 2007), critical junctures (Collier and Collier 1991), punctuations (Baumgartner and Jones 2009), structural breaks (Chow 1960;Hansen 2001), and turning points (Abbott 1997).We refer to them here by the generic term "shock," meaning parameter shifts that impact a system but do not appear to be generated by the system. 2hocks are often invoked when forecasts fail.In the 1930s, for example, economist Alfred Cowles calculated that professional stock market forecasters performed no better than random (Cowles 1933; see also Friedman 2014).At the end of the twentieth century, despite vast improvements in forecasting techniques, economists were still unable to predict many recessions and crashes (Loungani 2001).Studies of civil conflict often report high levels of statistical significance for their models, but when examined as out-of-sample forecasts, these models are not nearly as successful (Tikuisis, Carment, and Samy 2012;Ward, Greenhill, and Bakke 2010).Indeed, some of the most important phenomena of recent history have proven inordinately resistant to prediction (Cerulo 2006;Harcourt 2007;Kurzman 2004;Tetlock 2005; for journalistic treatments of this subject, see Gardner 2011;Sherden 1997;Silver 2012).
Forecasters are aware of these challenges.They typically qualify their predictions with the caveat that the patterns they observe hold only so long as the parameters of their models remain stable, and they have cataloged numerous ways in which these parameters may shift (Clements and Hendry 2006:609-14).Forecasters have developed sophisticated methods to control for shocks through dummy variables, splines, and other techniques, in an attempt to identify the unvarying, underlying properties of their models.
This article proposes, by contrast, that shocks are not so readily tamed.Applying standard forecasting methods to Israeli-Palestinian interactions, one of the world's most closely watched conflicts, this study finds that forecasts are most likely to fail during historic moments that social scientists are most eager to predict.The article then distinguishes four ways in which shocks may undermine prediction and identifies major episodes in Israeli-Palestinian interaction that illustrate these shocks.Each of these forms of shock corresponds to a distinct element in forecasters' models: not just coefficients, which are the usual suspects in time series analysis of structural breaks, but also time-varying lag structures, shifts in the values of endogenous variables, and changes to the set of endogenous variables.The article concludes by extending the argument from forecasting to retrocasting, that is, to time series analyses and historical explanations more generally.

Israel-Palestine
Jews and Palestinian Arabs have lived in the Levant for ages, but only in the past century have they engaged in persistent conflict, capturing global attention.This conflict grew out of duplicate nationalisms that emerged in the last decades of the Ottoman Empire: political Zionism, beginning in the 1890s (Vital 1975), and Palestinian nationalism, beginning in the 1910s (Khalidi 1997;Nafi 1998).Both of these movements envisioned a nation-state between the Jordan River and the Mediterranean Sea (Biger 2004;Shelef 2010).For the past two decades, many Israelis and Palestinians3 have circumscribed their nationalist aspirations in support of a "two-state" solution that would divide the land into separate enclaves (Shamir and Shikaki 2010).Still, some Israelis and Palestinians reject such a compromise and aspire to control the entire territory (Gunning 2008;Milton-Edwards and Farrell 2010;Taub 2010).Violence has repeatedly undermined attempts to negotiate a lasting political accommodation.
This high-profile case has generated an interdisciplinary debate over the extent to which Israeli-Palestinian interactions may be forecast.Some observers emphasize the history of "unpredictable risks and limited rationality" (Dowty 2012:114).Others view this history as "an all-toofamiliar pattern of confrontation and violence" (Tessler 2009:754).Recent quantitative analyses by economists, political scientists, psychologists, and sociologists are also split on this issue.Most emphasize predictability, finding statistically significant patterns through which Israeli and Palestinian actions follow from previous ac-tions (Beasley 2008;Braithwaite, Foster, and Sobek 2010;Brym and Andersen 2011;Brym and Araj 2006;Haushofer, Biletzki, andKanwisher 2010, 2011;Kaplan et al. 2005;Maoz 2007).Other researchers, however, distinguish the predictability of Israeli actions and the limited predictability of Palestinian actions (Hafez and Hatfield 2006;Jaeger and Paserman 2006, 2008, 2009).Two recent analyses find periods of greater predictability alternating with periods of lesser predictability (Dugan and Chenoweth 2012;Golan and Rosenblatt 2011).Golan and Rosenblatt (2011) suggest that it is "implausible" to assume a single impulseresponse function for a phenomenon that bridges multiple "epochs," and they call for further data gathering on additional "aspects of this multifaceted conflict."Our study addresses both of these concerns.First, we analyze a greater palette of Israeli-Palestinian interactions than the narrow spectrum of events acknowledged in previous studies, which have focused primarily on fatal violence, sometimes combined with data on nonfatal attacks and imprisonment (see Table A1 in the Supplemental Materials).We incorporate many more forms of interaction, including negotiations, protests, and a host of other events that occupy much attention in the region and around the world.Second, we investigate the possibility that multiple forms of shocks may occur at irregular intervals, generating uneven, syncopated "epochs."Previous studies involved dummy variables or time series end points corresponding with distinct periods of Israeli-Palestinian interaction, implying that transitions from one period to the next are exogenous to the interactions themselves.Yet, like many other time series analyses, these studies have not explored the ways in which patterns shift at break points of history.This article broaches the implications of multiple forms of shocks for the substantive understanding of the case and for the enterprise of forecasting and retrocasting more generally.

Data and Methods
To document Israeli-Palestinian interaction, we examine all 15,884 news reports of Israeli-Palestinian events from the Reuters news agency over the course of 11,219 days (April 15, 1979, to Decem-ber 31, 2009).An algorithm developed by the Penn State Event Data Project (formerly known as the Kansas Event Data System) machinecoded the first sentence in each article to identify the actor, the action, and the target of action (Schrodt and Gerner 1994, 1997, 2000;Rasler 2000;Goldstein et al. 2001;Brandt and Freeman 2006;Brandt, Colaresi, and Freeman 2008;Brandt, Freeman, and Schrodt 2011;Dugan and Chenoweth 2012).Of the 139,025 articles in the Penn State "Levant" data set during this period (http://eventdata.psu.edu/data.dir/levant.html),we select all events with Israeli actors and Palestinian targets and all events with Palestinian actors and Israeli targets.Articles that report joint events, in which Israelis and Palestinians both engage in the same action toward one another on the same day, are recorded twice as separate directed dyads.For the analysis of actor shock, discussed later, we also included all 5,643 stories reporting Israeli and Palestinian interaction with Lebanese actors during this period.
The Levant data set sorts events into 20 categories of action, called CAMEO (Conflict and Mediation Event Observations) codes, with more than a hundred subcategories (Schrodt 2012).We sort the 20 root categories into an ordinal scale of seven categories, ranging from -3 (violent conflict) to +3 (cooperation): We sum the events into a single score for each directed dyad for each day. 4 We also include a dummy variable for Saturdays, the Israeli weekend, because there are 35 percent fewer reported events on this day than on other days.Table A2 in the Supplemental Materials presents descriptive statistics.
In addition, we performed parallel analyses with the articles coded according to the Goldstein (1992) scale of conflict cooperation, as applied to the Levant data set by Schrodt (2007), and with net daily totals of material cooperation (CAMEO codes 6-9) and material conflict (CAMEO codes 15-20) used by Brandt et al. (2011).We also examined articles on fatal violence alone (Goldstein code -10), both as a daily count and as a daily 4 Eighty-six percent of the articles in the Levant database have date phrases in the lead sentence such as "today," "Monday," "last night," "next week," or "April."Of these, 92 percent refer to events that occurred the same day as the report.Two percent refer to the previous day, and another 3 percent refer to other days within the same week, some of them in the future ("negotiators will meet tomorrow").Articles that have no identifiable date phrases in the lead sentence normally refer to developments on the same day as the report ("meetings have begun," "leaders are trying").Events clearly identified with a previous date are coded as occurring on that date; all other events are coded as occurring on the date of the report.
binary.And we analyzed a separate source of violent events, the B'Tselem list of fatalities in the Israeli-Palestinian conflict, both as a daily count and as a daily binary.The results of all these analyses are directly analogous to the findings reported here.
Our scale of daily events is an imperfect representation of relations between Israelis and Palestinians in a number of ways, among them the bias and filter of the news agency, its editors, and reporters in selecting what events to cover; the reduction of each news report to a single event category; the asymmetry of the actors' decisionmaking processes, because the most frequently covered Israeli actions (such as military and police operations) are more likely to be dictated by a handful of government officials, whereas a large number of people may independently engage in the most frequently covered Palestinian actions (such as riots); and the use of this ordinal scale as a continuous variable.Notwithstanding these caveats, this data set is a much more nuanced proxy for Israeli-Palestinian interaction than the data in other recent time series analyses.Figure 1 displays the daily event totals for Israeli and Palestinian actions toward one another, smoothed with polynomial regression.
We adopt the methods of much recent quantitative analyses of Israeli-Palestinian interactions, and the advice of methods texts in forecasting and time series analysis (Box, Jenkins, and Reinsel 2008;Lütkepohl 2005;Montgomery, Jennings, and Kulahci 2008), by examining these data with vector autoregression (VAR) models, which compute simultaneous equations for each directed dyad.The general form of this analysis, using notation conventions from Lütkepohl (2005), is where y t = (y 1,t , . . .y K,t ) is a vector of K time series variables; A 1 through A p are matrices of coefficients; t designates the time unit from 0 through p lags; ν = ν 1 , . . ., ν K ) is a vector of K intercept terms; b is a vector of coefficients for exogenous variable(s) x; and u t = (u 1,t , . . ., u K,t ) is a K-dimensional white noise or innovation process.The specific implementation in the present analysis involves simultaneous equations with K = 2 time series variables (Israeli actions toward Palestinians and Palestinian actions toward  Israelis-the section incorporating Lebanese actors involves K = 6 time series variables) and p = 8 days, incorporating 1-day through 8-day lags of both Israeli and Palestinian actions, based on the maximum optimal value of the Schwarz's Bayesian Information Criterion (SBIC) (see the section on duration shock for a discussion of timevarying lag structures): where y 1 represents Israeli actions toward Palestinians on a given day; y 2 represents Palestinian actions toward Israelis on that day; a 1 through a 32 and b 1 through b 2 are coefficients; v 1 and v 2 are intercepts; x t is an exogenous dummy variable for Saturdays; and u 1 and u 2 are white noise terms.
Granger causality tests confirm the significance of mutual effects of Israeli and Palestinian actions at all lag structures from 1 to 81 days, the maximum that the Stata 12.0 software package allows.The results of Granger causality tests for selected lag structures are presented in Table 1; p-values less than 0.01 are evidence of the effect of past actions by one side on a given day's actions by the other side.Nonstationarity of both time series-Israeli actions toward Palestinians and Palestinian actions toward Israelis-is ruled out with three versions of the augmented Dickey-Fuller unit root test (see also Haushofer et al. 2010, Table S2): a basic approach, a trend specification, and a drift specification (see Table 1).Negative Z-scores with p-values less than 0.01 are evidence for stationarity.
In-sample vector autoregression results with 1-day through 8-day lags are presented in Table 2.A shift of 1 standard deviation in the Israeli or Palestinian daily scores generates small but statistically significant changes in the other's daily scores.The impulse-response function (IRF) in Figure 2 shows a long-lasting effect peaking at eight days after the shock, using in-sample analyses.The effect on Palestinian scores is half as large as for Israeli scores, just as the absolute scores of Palestinian daily actions are half as large as the absolute scores of Israeli daily actions.(The same pattern emerges with greater effects in models with orthogonalized IRFs.) This model explains only a small portion of Israeli-Palestinian interaction over the past generation-17 percent of the variation in the in-sample Israeli daily scores and 12 percent of the Palestinian scores.However, the model performs as well as other recent time series analyses of Israeli-Palestinian interaction, whose median reported r 2 statistic is 12 percent (see Table A1).The model also performs well on the Theil inequality coefficient (TIC), which compares the root mean square errors (RMSE) of the in-sample predictions with the RMSE of a naive forecast of "no change" in the event scores from one day to the next (Loungani 2001).According to this indicator, the model's prediction error is 23 percent lower than the naive forecast for Israeli scores and 25 lower percent for Palestinian scores (see Table 3).
Table 3 also presents additional robustness checks, including alternative data.If we abandon the ordinal scale of daily events and measure only a binary of fatal violence (Goldstein code of -10) and no fatal violence for each directed dyad for each day, the TIC peforms just as well, beating the naive no-change forecast by 23 percent for Israeli actions and 25 percent for Palestinian actions.Parallel findings result from the net number of events of material cooperation and conflict for each directed dyad for each day, following the procedures of Brandt et al. (2011).
Similar findings emerge with a separate data source, a list of Israeli-Palestinian fatalities from the period September 29, 2000, through December 31, 2009, compiled by the peace group B'Tselem (http://www.btselem.org/statistics).Measured either as daily counts of fatalities for each directed dyad or as a binary of fatalities or no fatalities, the TIC for the B'Tselem data beats the naive no-change forecast by 25 to 28 percent (see Table 3).(Augmented Dickey-Fuller   tests indicate that all of these time series exhibit stationarity.The optimal SBIC for the binary measure of violence and the B'Tselem number of fatalities is 1-day through 10-day lags; the optimal SBIC for the binary measure of B'Tselem fatalities on a given day is 1-day through 6-day lags.)All of the findings so far refer to in-sample analyses.However, the model is also robust to out-of-sample forecasting.Starting with a window of 90 days at the beginning of the time series, we use the coefficients from that period to forecast event scores on the ninety-first day; then we extend the window to 91 days and forecast event scores for the ninety-second day; and so on day by day to the final day in the data set.The root mean square of prediction errors for out-ofsample forecasts-using either the full event scale or the binary measure of violence-is virtually the same as with in-sample prediction for the Reuters events (see Table 3).The TIC statistics are also almost identical for out-of-sample and in-sample predictions.
Similar findings result from Bayesian VAR models like the ones used in several recent papers (Brandt and Freeman 2006;Brandt et al. 2008;Brandt et al. 2011).As described in Appendix B, these models generate thousands of forecasts for each day for each directed dyad.Using the same extending-window approach as for non-Bayesian out-of-sample forecasts, the mean Bayesian out-of-sample forecasts generate root mean squared errors that are nearly identicalwithin 1 percent-of the root mean squared errors for non-Bayesian out-of-sample forecasts.Just more than three-quarters of the observed values for each day's directed dyad ( 76  This exercise confirms the reciprocal actionresponse dynamic of Israeli-Palestinian interactions over the past generation, using a more subtle daily indicator than in previous studies.Israelis and Palestinians have engaged in tit-for-tat interactions over the past 30 years, not just with regard to violence but also with regard to other forms of conflictual and cooperative actions as well.

Fluctuation in Prediction Error
We could stop here, as many time series analyses do, with the best models we are able to produce for the entire period under study.Hidden within our models, however, are dramatic swings in accuracy, which published time series analyses rarely mention.These models generate their worst prediction errors during the most important episodes in Israeli-Palestinian history, the interactions that we are most interested to explain.
As a threshold of historical importance, we take events during the period 1979-2006 that are mentioned in the chapter headings and subheadings of at least three out of seven recent Englishlanguage history books on Israeli-Palestinian interaction (Caplan 2010;Dowty 2012;Gelvin 2007;Harms and Ferry 2008;Milton-Edwards 2009;Smith 2010;Tessler 2009).(Table A3 lists all events mentioned in the headings of at least two of the seven texts.)For eight of these 13 important events (two additional episodes cannot be dated precisely), prediction error as measured by RMSE is greater in the 90 days after the start date of the event than in the 90 days before the start date (shaded cells in Table 4).Prediction error increases during all of the six most important events, events that were headlined in five or more of the history books.
The fluctuation of prediction error is represented graphically in Figure 3, where the squared errors for out-of-sample predictions of each day's directed dyads have been smoothed with polynomial regression between each of the major historical events.5Error is associated both with conflict and with cooperation-prediction error jumped after some of the major milestones in the peace process, including the Madrid Conference (especially for Israeli actions) and the first Oslo Accord (especially for Palestinian actions).However, the most dramatic increases in prediction error followed the outbreak of conflict, especially the First and Second Intifadas.These events were particularly poorly predicted by the interactions that preceded them, even if the forecast is based only on interactions directly preceding these events and ignores older interactions beyond a window of 30, 90, or 365 days.The pattern in Figure 3 is not simply an artifact of the scale of daily interaction scores-it emerges just as strongly with a binary indicator for violence, with similar leaps in squared prediction error after the First and Second Intifadas, the Madrid Conference (especially for Israeli actions), and the first Oslo Accord (especially for Palestinian actions).These leaps in prediction error provide prima facie evidence of shocks in Israeli-Palestinian interactions.Our time series model passes the usual tests but is unable to forecast or retrocast historic episodes accurately on the basis of prior interactions.
These findings contradict the claims of previous time series analyses of Israeli-Palestinian interaction, which have not investigated timevarying errors in prediction.Like the bulk of time series analyses in general, these studies present models that best describe the time periods covered by their data.They imply, without present-ing confirmation, that these models apply equally well to watershed moments and routine moments within each period.Studies that find tit-for-tat interactions draw conclusions about watershed episodes, not just routine ones.Studies that find no tit-for-tat interactions draw conclusions about routine moments, not just watersheds.We conclude that Israelis and Palestinians have been inconsistent in their responses to one anothertheir pattern of interaction appears stable during some parts of the past three decades but less stable during moments of historic change.
This pattern inverts the findings of Golan and Rosenblatt (2011), who concluded that predictable cycles of violence were visible only occasionally in Israeli-Palestinian interactions, "whereas in most periods, retaliation explains a minuscule portion of events, suggesting that the parties display no statistical regularity in their actions."Using a longer time span (30 years instead of 8 years) and a richer set of interactions (not just killings and rocket attacks), we find that a consistent pattern of retaliation is less visible during periods of the greatest violence, while most periods exhibit greater statistical regularity.Next, we go further to identify four distinct aspects of interaction that may shift during moments of historic change.

Shock Absorbers
When faced with exogenous shocks, forecasters often attempt to identify and control for structural breaks, either by splitting the time series and calculating separate models for each period, or by introducing dummy variables corresponding to each period.These approaches face at least three challenges.First, on theoretical grounds, these approaches imply that historical trends ought to be understood net of their most important episodes, a point we will take up again in the concluding section of the article.Second, on empirical grounds, these approaches may not solve the problem of time-varying predictive accuracy.Third, on methodological grounds, these approaches fixate on structural breaks in coefficients while ignoring breaks in other parts of their models.Time series analysis of Israeli-Palestinian interaction illustrates all three of these challenges.
To examine whether fluctuation in prediction error is lower between structural breaks than across them, we chopped the time series of Israeli-Palestinian interaction into more than a dozen separate periods, based on the historic episodes identified by historians, and calculated separate vector autoregression models on each period.The resulting prediction error (Figure A4) rises and falls in a pattern almost identical to Figure 3, which forecasted interactions based on the entire 30-year sample.A similar pattern emerges with dummy variables for each period, using the first period as the reference category.
Instead of relying on historians' accounts to identify moments of epochal change, we might follow the inductive procedure proposed by time series texts such as Lütkepohl (2005) and search for the presence of structural breaks in the coefficients of the lagged dependent variables and the intercepts.We estimate the optimal number of breaks, as indicated by the Bayesian Information Criterion, and the dates of these breaks, using econometric methods developed by Bai and Perron (2003) and implemented with the "breakpoints" function in the "strucchange" package in R (Zeileis et al. 2012a(Zeileis et al. , 2012b)).We set the minimum segment size at 90 days and limit the maximum number of breaks to 20 or fewer (results shown in Table A5, Model 1).
According to this inductive procedure, there are at least 20 structural breaks in the 30-year time series, but the optimal number of structural breaks in the time series is only 1.This break is estimated to occur on September 7, 2000, the same month that the Second Intifada began.Including a dummy variable for the period after this date scarcely improves the model, reducing prediction error by less than 2 percent (Table A5), and fails to smooth out the fluctuation in prediction error (Figure A6).
In empirical terms, then, forecasting methods designed to control for structural breaks may not control the problem we have identified: prediction errors remain highest at the moments of greatest historical importance.At the same time, these methods confirm the presence and substantive importance of breaks in Israeli-Palestinian interaction: coefficients for action-reaction models are significantly different before and after the break in September 2000.Before this time, impulseresponse graphs display lesser and less consistent effects (Figure A7) than graphs calculated over the entire 30-year time series (Figure 2).(An alternative approach to structural breaks, using Chow tests on nonoverlapping segments of the time series, is described in Appendix C. The alternative approach identifies different break points but also fails to tame fluctuation in prediction error.)

Four Types of Shock
Structural breaks of the sort identified thus far, involving shifts in coefficients and intercepts, are the most common form of shock incorporated in forecasting analyses-indeed, much of time series analysis equates the concept of exogenous shock exclusively with this form of structural break.However, this is only one aspect of forecasting models that is vulnerable to shock (Bai 2010;Bai and Perron 2003;Brandt et al. 2011;Castle, Fawcett, and Hendry 2011;Clements and Hendry 2006;Qu and Perron 2007).We contribute to this literature by documenting three additional types of shock, each type corresponding to a separate aspect of forecasters' models: • Effect shock: shifts in coefficients, or how much each action affects later actions Input shock.We give the label "input shock" to shifts in actors' behavior.In time series terms, the values of one or more endogenous variables break from their earlier pattern, as distinct from shifts in the coefficients for these variables.Forecasters rarely explore this form of shock.
We test for input shock over the full time series with the "strucchange" package in R, applied separately to the two univariate time series of Israeli actions toward Palestinians and Palestinian actions toward Israelis (Models 2 and 3 in Table A5).As with the joint model of interactions, this procedures finds at least 20 structural breaks in the 30-year time series both for Israeli actions and for Palestinian actions, but only 2 important breaks, according to the Bayesian Information Criterion.The first breaks are estimated to occur within days of each other in September 2000 (September 28 for Israeli actions and September 26 for Palestinian actions), at almost the same time as the primary structural break in the joint model.The second breaks, however, differ for Israeli and Palestinian actions.Palestinian actions exhibit an input shock on August 3, 2002, around the start of construction of the Israeli separation barrier/wall, whereas Israeli actions exhibit an input shock more than two years later, on November 7, 2004, within days of Arafat's death.
These findings tell a complicated story that is not easily reconciled with the usual analyses of Israeli-Palestinian interactions or with the standard methods of time series analysis.The one consistent and unsurprising result is that Israeli and Palestinian actions shifted around the time of the Second Intifada in 2000, both in univariate and joint VAR models.However, we also find evidence of input shock without effect shock.Unexpectedly, input shock often appears at moments that are more commonly associated with actions on the other "side"-we see input shock in Israeli actions but not Palestinian actions after the death of Arafat, and input shock in Palestinian action but not Israeli actions after the construction of the separation barrier/wall.So far as we know, there is no readily available time series method to compare the importance of input shocks and effect shocks or to test for both forms of shock simultaneously.(The same problem arises with the alternative approach to breaks that is described in Appendix C.) The challenge for forecasting is that both types of shock may exist in the same time series data, sometimes coinciding at the same historic moment and sometimes not.
Duration shock.We give the label "duration shock" to shifts in how long an effect lasts.In time series models, duration shock is visible through changes in the optimal lag structure.To our knowledge, time series analysis has not developed criteria to identify multiple structural breaks in lags, but we will try to demonstrate the presence of duration shock in Israeli-Palestinian interactions using the three information criteria that are frequently used to determine optimal lag structures: the Akaike Information Criterion (AIC), the Hannan-Quinn Information Criterion (HQIC), and SBIC.
Beginning with a 365-day period at the beginning of the time series, we lengthened the window 90 days at a time and recalculated the optimal lag for the three information criteria.The values of the optimal lags for each window are plotted in Figure 4.Each of the three indicators suggests that the optimal lag has risen as the time series lengthened-the longer the time series, the more long-lasting the effects of Israeli and Palestinian actions appear.Yet this rise was uneven, both across information criteria and across time.All three indicators suggest that lag structures lengthened during the Second Intifada in the 2000s.According to the AIC and SBIC, but not the HQIC, optimal lags lengthened during the First Intifada as well.In the mid-2000s, the AIC dropped, the HQIC rose, and the SBIC remained almost constant.Further inconsistencies ensue if we limit observations to rolling windows of one year or some other period.The sensitivity of these indicators to shifts in time period dampens hopes that lag structures will remain consistent for time series analysis of important historical episodes.
Shifts in optimal lag structure coincide with several of the historic events in Israeli-Palestinian relations identified by historians of the region: within 90 days after the outbreak of the Israeli-Lebanese War in 1982, the optimal lag indicated by the SBIC rose from 1 to 4; within 90 days after the outbreak of the Second Intifada in 2000, the optimal lag rose from 5 to 8. Other indicators of optimal lag structure also rose at this time or soon after.We may gauge the significance of these shifts with a Chow test that analyzes the 90 days before and 90 days after the outbreak of these two historic episodes.This test interacts the additional lagged dependent variables (the second, third, and fourth lags for the Israeli-Lebanese War; the sixth, seventh, and eighth lags for the Second Intifada) with a dummy variable equaling zero prior to the event and 1 afterward.The test is significant in both instances (chi-squared = 33.8 for the Israeli-Lebanese War and 49.5 for the Second Intifada, p<0.01 for each), confirming that the lags experienced a structural break during these episodes.These tests are robust to different lengths of the periods before and after each outbreak.
Duration shock may involve substantive implications.If we examine only the period prior to the Second Intifada, for example, a five-day lag structure generates an IRF with smaller, shorter, and less statistically significant effects (shown in Figure A7) than for the full period of 1979 to 2009  (shown in Figure 2).Prior to the Second Intifada, in other words, Israeli responses appear less predictable than in the full time series; Palestinian responses, though lower in magnitude, appear to be more consistently predictable.The tit-for-tat pattern described in earlier time series studies of Israeli-Palestinian interactions only holds, it appears, for certain time periods and particular lag structures.Forecasting that relies on a single, unvarying lag structure may not be robust.
Actor shock.We give the label "actor shock" to shifts in the number of agents in the actionreaction system, for example, when an exogenous variable suddenly becomes endogenous, or vice versa.While time series analysis has long been concerned with omitted-variable bias, the concept of actor shock involves shifts in this bias over time.
With Israeli-Palestinian interactions, an example can be found in Lebanese actors-primarily Hizbullah but also the Lebanese government and other groups, as actors and targets of action in Reuters news stories.Directed dyads of Israeli-Lebanese and Palestinian-Lebanese interaction are incorporated endogenously along with Israeli-Palestinian interactions in these analyses.For the full 30-year time series, the two primary tests for omitted variables contradict one another.The likelihood ratio (LR) test suggests that the inclusion of Lebanese actors improves the fit of the VAR model (chi-squared = 3,130.4,p<0.001).7 Conversely, including Lebanese actors does not improve prediction errors-it slightly worsens RMSE, with an increase of less than 0.1 percent, for both Israeli actions toward Palestinians and Palestinian actions toward Israelis.
Hidden in these long-term findings are periods when the inclusion of Lebanese actors makes a considerable difference in modeling Israeli-Palestinian interaction.Calculating RMSEs for the full and constrained models for every 90-day period in the time series, advancing day by day through all 30 years, we find moments in which the inclusion of Lebanese actors reduces prediction errors by more than 50 percent, and other periods when it increases errors by more than 20 percent (see Figure A8).The inclusion of Lebanese actors actually increases prediction errors in 62 percent of the windows.In other words, Lebanese actors are not uniformly exogenous or endogenous to Israeli-Palestinian interactions, and their inclusion or exclusion in models of Israeli-Palestinian interaction may matter a great deal-in different directions-at different points in time.Omittedvariable bias seems to be highly sensitive to the timing of the period of study.
As an illustration of actor shock, we examine prediction error during the Israeli-Hizbullah War of 2006, which lasted 59 days from July 12 through the end of the Israeli blockade on September 8, as well as the 59 days prior to the war and the 59 days following the war.The black bars in Figure 5  (The RMSE analysis could not be confirmed with LR tests because nested VAR models would not run with these short time periods.)Nothing in the data just prior to the outbreak of the war would lead us to believe that Lebanese actors would soon become important for the forecasting of Israeli-Palestinian interaction.That is actor shock.
Several previous time series analyses of Israeli-Palestinian interaction incorporated a third party, the United States (Rasler 2000;Goldstein et al. 2001;Brandt and Freeman 2006;Brandt et al. 2008).The first two of these studies found that interactions with America shifted in the early 1990s just as the Israeli-Palestinian peace process progressed.However, these studies do not explore the implications of discontinuity, and subsequent time series analyses have not followed suit.Our findings confirm that actor shock may be a crucial aspect of Israeli-Palestinian interaction at certain important junctures, illustrating how forecasting may have to contend with the sudden entry and exit of new endogenous factors.

Implications
Late in life, Auguste Comte wrote to an Ottoman statesman with a prediction (Comte [1853] 1876:xliii).Large states like the Ottoman Empire were doomed, he suggested, through the "normal application of the sociological law which everywhere restricts the territory of temporal dominions to its natural size."Only "homogeneous" states, Comte concluded, would avoid "spontaneous dismemberment." Comte's prediction was in general accurate: The Ottoman Empire was dismembered into successor states with more "homogeneous" populations.But this generally accurate forecast overlooked an important variety of shocks that Comte did not anticipate.Some dismemberment took the form of nationalist uprisings similar to the movements in Greece and Serbia on which Comte based his prediction.Elsewhere, however, nationalism emerged only after the dismemberment of empire, as in much of the Levant and the Arabian peninsula.Some of the new states, such as Greece and Turkey, achieved homogeneous populations only through ethnic cleansing.In the British Mandate of Palestine-the territory that is the subject of this article-dueling Israeli and Palestinian nationalisms cast doubt on Comte's confidence in the existence of a "natural size" for each state.Comte's forecast may seem prescient from a bird's-eye view, but it looks more dubious when we examine the particulars.
The forecasting profession has developed far more sophisticated methods since Comte's time.However, these methods continue to struggle with the problem of unanticipated shocks.This artisociological science | www.sociologicalscience.com  cle offers a statistical model that is adequately successful at predicting Israeli and Palestinian interactions over the past generation, according to the usual standards of the forecasting profession.On closer inspection, however, these models falter during the most important episodes of historic change.This article identifies four ways in which shocks have disrupted patterns of Israeli-Palestinian interaction at watershed moments, contributing to recent methodological work on the mechanisms of structural breaks: shifts in the duration of effects (duration shock), shifts in the scale of effects (effect shock), shifts in actions (input shock), and shifts in the number of agents in the system (actor shock).This typology is not intended to be exhaustive but to encourage research into additional forms of shock and more sophisticated tests for the presence of shocks of different sorts.
The identification of different types of shocks allows us to specify several unexpected aspects of Israeli-Palestinian interactions, such as shortterm input shock affecting Israeli actions, but not Palestinian actions, during the First Intifada, suggesting that the statistically significant change in behavior during that historic period did not occur on the Palestinian side but rather on the Israeli side.The Second Intifada, by contrast, involved input shock on both sides as well as effect shock scrambling the lag coefficients.These coefficients underwent a short-term effect shock several years later, with the death of Yassir Arafat.Israeli actions also shifted at that time, whereas Palestinian actions did not.Palestinian actions shifted with the construction of the separation barrier/wall in 2002, whereas Israeli actions did not.The duration of interaction effects increased abruptly at important moments, and at least one third party entered and exited Israeli-Palestinian interactions (the Lebanese in 2006).
Israeli and Palestinian leaders have frequently debated the subject of shocks as they struggled to understand whether old patterns of interaction still held or new ones were forming.In the early days of the First Intifada in 1987, for example, the Israeli defense minister reassured a colleague that "the army will assert control very quickly," while a foreign ministry official drew the opposite conclusion that "this was the beginning of something big."Palestinian leaders were also disconcerted by the uprising.The head of intelligence for the Palestinian Liberation Organization recalled that "when the intifada broke out, we were at first afraid" that it would amount to nothing, because "nobody had been calculating on such an intifada, with its force and power.The one who was most in touch with the occupied territories was Abu Jihad [Khalil al-Wazir], but even he didn't expect it to be like that" (Gowers and Walker 2003:358).
This article encourages the acknowledgment of surprise in the forecasting profession, as a check on the field's eternal optimism.As a matter of professional "best practices," forecasters should test for and report the presence of shocks of various sorts.Indeed, this practice should be adopted not just by forecasters but by retrocasters as well.The four types of shock identified in this article may apply not just to vector autoregression models but to other approaches to time series analysis as well, including models that focus on larger sets of variables with a smaller number of time points.Perhaps an extremely complicated model might account for syncopated shocks of different forms.That is the goal of time series methodology-"dealing with structural breaks" (Perron 2006) so that they will no longer obscure the underlying dynamics of the time series.Such a model would describe the process of Israeli-Palestinian interaction, net of the surprising and important events that have shaped this history over the past generation.This article makes the case, by contrast, that "dealing with" shock is not just methodologically difficult but substantively unsound.Shocks are not just disturbances to be controlled for statistically, in search of a stable underlying set of parameters.They are significant in their own right and a crucial feature of Israeli-Palestinian interaction over the past generation.Accounting for one form of shock may not help in anticipating the next one.Shocks can undermine forecasting accuracy at historic moments, without warning.Shocks get to the heart of social-scientific explanation: the extent to which the past may be said to anticipate the future.Whether the analysis is conducted retrospectively with in-sample forecasts or prospectively with out-of-sample forecasts, whether it is grounded in quantitative or qualitative evidence, the intervention of shock may detach future from past.Shocks draw attention to discontinuities in history, while time series and other historical analyses are normally attuned to continuity.Explanations work best when the dynamics of cause and effect remain stable, but moments of historic change, when these dynamics shift, are less readily attributed to the dynamics that have gone before.When forecasts and retrocasts fail, they set bounds on social-scientific explanation.

Figure 1 :
Figure 1: Israeli-Palestinian Interactions: Daily Events, 1979-2009.Daily event scores, smoothed with polynomial regression, reflect the deepening conflict during the First and Second Intifadas and the lessening of conflict during the period of the Oslo Accords.
Of particular interest are the cells in the dark outlines, which indicate the effect of past Israeli actions on a given day's Palestinian actions (upper right-hand cells) and vice versa (lower left-hand cells).The low p-values in these cells confirm the mutual impact of Israeli and Palestinian actions

Figure 2 :
Figure 2: Impulse Response Functions for Israeli-Palestinian Interactions, 1979-2009.Israeli and Palestinian response functions reflect a statistically significant pattern of tit-for-tat interaction over the past generation.
percent for Israeli actions and 77 percent for Palestinian actions) fall within 1 standard deviation of the median Bayesian forecast-a 68 percent posterior probability interval used by Brandt et al. as an indicator of forecasting success.

Figure 3 :
Figure 3: Israeli-Palestinian Interactions: Squared Prediction Error, 1979-2009.Out-of-sample prediction error leapt at many historic episodes in Israeli-Palestinian interaction.Note: Events within three months of each other are displayed in this chart as a single line: Camp David II and the Second Intifada in 2000; Israeli-Hamas War and Israeli-Hizbullah War in 2006.

Figure 4 :
Figure 4: Duration Shock: Changes in Optimal Lag Structure, 1979-2009.Optimal lag structures fluctuate as the window of observations is extended over time.
indicate the percentage change in RMSE for Israeli actions when Lebanese actors are included in the model; the gray bars indicate the percentage change in RMSE for Palestinian actions.During the war, including Lebanese actions reduces prediction error by 22 percent for Israeli actions and 40 percent for Palestinian actions.Just before the war, however, including Lebanese actions increases prediction error by 15 percent for Israeli actions and 39 percent for Palestinian actions.Just after the war, including Lebanese actions increases prediction error by 10 percent for Israeli action and reduces prediction error slightly, by 2 percent, for Palestinian actions.Over this six-month period, according to this standard measure of omitted-variable bias, Lebanese actors pop in and out of the model.

Figure 5 .
Figure 5. Actor Shock: Prediction Errors With a New Actor

Figure 5 :
Figure 5: Actor Shock: Prediction Errors With a New Actor.Including Lebanese events reduces prediction error considerably during the 2006 Israeli-Hizbullah war, but not just before and after the war.

Table 1 :
Mutual Effects of Israeli-Palestinian Actions Note: Based on 11,217 observations.1% Critical Values for the Augmented Dickey-Fuller test are in parentheses.* Denotes (Prob > χ2) < .01 for the Granger Causality Test and p < .01 for the Augmented Dickey-Fuller test.on one another-six of eight lagged Israeli values are significant at the p <0.01 level, and six of eight lagged Palestinian values are significant at this level.

Table 3 :
Robustness Checks These figures are inflated by five days with particularly high numbers of Palestinian fatalities during the Israeli-Hamas War of 2008-2009.If those days' totals are capped at the next-highest number of daily fatalities (71 Palestinian fatalities on March 1, 2008), the RMSE drops by half.

Table 4 :
Change in Prediction Error after Major Historical Events Shaded cells indicate increases in the root mean squared error (RMSE) in the 90 days following the onset of a major historic event, compared with the prediction error in the preceding 90 days.