Delta and other air carriers show how not to do disaster recovery

The August 8 systemwide outage suffered by Delta Airlines – attributed to a power failure at the company’s primary data center – is merely the most recent of a string of technology-related problems significantly affecting operations of major air carriers. In just the past 13 months, United (July 2015) and Southwest (July 2016) lost the use of their computer systems due to problems blamed on faulty network routers and Delta last week joined JetBlue (January 2016) in experiencing data center power failures. The predominant response from the IT industry is both surprise and disappointment that mission-critical airline operations systems do not seem to have reliable or effective continuity of operations or failover capabilities in place, whether in the form of backup power generation in data centers or redundant hardware or software systems. All of the recent outages highlight single points of failure for the airlines, which not only show poor design but also seem completely unnecessary given modern computing resources.

Conventional disaster recovery and business continuity planning begins by assessing the criticality of the business processes that information technology systems, networks, and infrastructure support. Alternate processing facilities (like secondary data centers) are categorized as “cold”, “warm”, or “hot” sites according to how rapidly the alternate facility can take over for or establish at least some level of business operations when the primary facility has an outage; hot failover is the most immediate, often entailing fully redundant or mirrored systems in two or more data centers that can work together or individually to keep systems available. In addition to alternate processing capabilities, most modern data centers – whether owned and operated by companies themselves (like Delta) or by outsourcing providers like Dell, HP, or Verizon (the last of which is used by JetBlue) – have redundant network and power connections as well as battery and generator-based backup power to try to avoid precisely the type of failure it seems affected Delta. Various news reports of the Delta outage have noted that many of Delta’s key operational systems, including the ones that failed on August 8, run out of a single Atlanta data center, dubbed the Technology Command Center, and have speculated that Delta chose not to implement an alternate processing site. Based on what happened on August 8, it seems fair to say that either Delta does not in fact have a secondary facility or, if it does, any automated failover procedures that are designed to shift operations to a secondary facility did not work as intended. Whether due to poor planning, misplaced financial priorities, or lack of disaster recovery testing, the events of the day provide clear evidence that Delta’s systems are neither reliable nor resilient in the face of unanticipated problems in the Atlanta facility. There seems to be some disagreement as to whether a power outage or an equipment malfunction was actually the cause of the outage, but neither of those issues should have brought Delta’s systems down if the company had implemented the sort of IT redundancy that is common among major commercial enterprises. Even when redundancy has been built in, the importance of testing cannot be overstated; without regular disaster recovery testing companies may operate under a false sense of security, until they actually encounter a problem and find that their failover mechanisms don’t work. This is apparently the case for the Southwest Airlines outage, which was blamed on a network router that began functioning improperly but did not actually go offline, with the result that existing backup systems were not activated to take over for the malfunctioning router.

cancel-screenThe apparent fragility of air carrier IT systems has raised concerns within the federal government, as seen this week in a letter from Senators Edward Markey and Richard Blumethal, both members of the Senate Commerce, Science and Transportation Committee, to Delta CEO Ed Bastian (the letter was also sent to executives at a dozen other airlines) asking for information about the state and general resilience of the airlines’ IT systems, their potential susceptibility to failure due to power or technology issues or to cyber-attack, and the affect on traveling members of the public when outages occur. Commercial air carriers’ IT systems are not explicitly considered part of the nation’s critical infrastructure (although aviation is part of the Transportation Systems Sector defined as critical infrastructure by the Department of Homeland Security) but Sens. Markey and Blumenthal emphasize the responsibility that Delta and other carriers have to ensure the reliability and resilience of their IT systems, especially in light of the large-scale consolidation of U.S. airlines. Many industry observers point to airline mergers, and in particular the need for merged carriers to integrate disparate IT systems, many of which rely on “legacy” technologies that may not have been designed for or easily adapted to high-availability deployments. It seems quite likely that the diversity of systems and technology characterizing many carriers post-merger makes their systems more vulnerable and makes business continuity planning more complicated than it would be with a more homogeneous IT environment, but there is nothing in the recent airline outages to suggest that merger-related IT integration had anything to do with the problems that brought flights to a standstill.

It’s hardly treason, but Trump’s call for Russian hacking still encourages illegal actions

Despite speaking for more than an hour at a rally in Florida on July 27, media attention following the speech centered on just two sentences uttered by Republican presidential nominee Donald Trump. Leveraging widespread speculation that Russian hackers were responsible for the recently publicized attack of Democratic National Committee computer systems, Trump made an appeal directly to that nation: “Russia, if you’re listening, I hope you’re able to find the 30,000 emails that are missing,” an apparent reference to the email messages that Hillary Clinton determined to be personal and deleted from her private server before handing the server over to the State Department and the FBI. Trump continued, “I think you will probably be rewarded mightily by our press.” Reaction to Trump statements was swift and predominantly critical, from both Democrats and Republicans, with a general consensus that Trump was actively encouraging espionage by a foreign power against American information technology assets. Michael Inboden, a member of the National Security Council under former president George W. Bush, called Trump’s comments “tantamount to treason,” although most politicians and analysts chose not to go that far, instead emphasizing how inappropriate it would be for a foreign government to try to intervene in or influence a U.S. presidential election.

A brief examination of relevant laws codified in the United States Code suggests that Trump, assuming for the sake of argument that his statements were serious and not a joke, is at the very least encouraging action that violates U.S. law, because computer hacking generally (whether perpetrated by domestic or foreign actors) is illegal under 18 USC §1030, which says in relevant part that anyone who “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains … information from any department or agency of the United States or information from any protected computer” may be subject to a fine or imprisonment of up to 10 years. It’s somewhat unfortunate that so many incensed observers chose to use the word espionage to characterize what Trump purportedly asked Russia to do, since under U.S. law (18 USC Chapter 37) that term only applies to defense information or to classified information, and at least according to Clinton, the deleted emails were personal in nature and did not contain any classified information (or, presumably, any information about matters of national defense). Michael Inboden also seems to have been overzealous in painting Trump’s words as treasonous, because acts of treason involve levying war against the United States (18 USC §2381), and it would be hard to characterize hacking emails as waging war. Still, it seems irresponsible to imply that any type of computer hacking or intrusion by Russia would be welcome, especially given the long history of alleged cyberattacks carried out by Russian nationals, whether or not they were working on behalf of the Russian government.

FedRAMP not delivering on promise of standard authorization

Nearly five years after the federal government launched the Federal Risk and Authorization Management Program (FedRAMP) and more than three years after the first FedRAMP authorization, many federal sector observers are questioning the effectiveness and even the relevance of the program. Although the number of FedRAMP-authorized service providers continues to grow – including as of June 21 the first set of providers authorized at a high FIPS 199 security categorization level – the more rapid adoption of cloud computing services FedRAMP was intended to facilitate has been hampered by many federal agencies’ unwillingness to accept FedRAMP authorization as sufficient for granting their own authorization to operate (ATO) or to accept ATOs granted by other agencies. This lack of reciprocity was highlighted by former Navy CIO and Department of Defense Deputy CIO Dave Wennergren in a commentary written for Federal Computer Week, in which he suggests that the Office of Management and Budget (OMB) needs to require agencies to accept ATOs previously granted by other agencies or by the FedRAMP Joint Authorization Board (JAB) to avoid the delays and unnecessary spending associated with the current prevalent practice of each agency repeating the authorization process even when selecting the same cloud service provider. As an example, the Department of Defense in January issued Defense Acquisition Services instructions to all DoD components that requires cloud service providers to obtain both a DoD provisional authorization from the Defense Information Systems Agency (DISA) and an ATO from the component’s authorizing official. This means that even DoD programs that choose to use one of the services DISA has already authorized under FedRAMP (such as Amazon Web Services Government Community Cloud, Microsoft’s Office365 Multi-Tenant & Supporting Services, or Verizon’s Enterprise Cloud Federal Edition) still need to complete an agency-specific ATO, and they cannot choose FedRAMP-authorized providers whose authorization came from another agency or JAB. This behavior is common among civilian agencies as well, who appear no more likely than the DoD to accept the judgment of the JAB or of agencies with ostensibly stricter security requirements than their own.

FDIC data breaches indicate systemic failures in security management and monitoring

Initial reports last month of a data breach at the Federal Deposit Insurance Corporation (FDIC), the quasi-federal government agency that oversees the soundness of the nation’s banks and insures millions of Americans’ deposits held by those banks, triggered an embarrassing and troubling series of disclosures about what seems to be a pattern of data exfiltration by FDIC employees who are leaving the agency. The February 2016 incident, which apparently included personal data on 44,000 individuals, was attributed by an FDIC executive to an inadvertent action by an employee who copied the data to a personal storage device on the employee’s last day with the agency. It turns out that the February incident was far from an isolated event; after being asked by the Chairman of the House Science, Space and Technology Committee to provide information about all major breaches at FDIC, the agency disclosed a similar incident that occurred in October 2015 and five other incidents since October, all of which involved outgoing employees copying FDIC data on customers to personal devices. Following a Congressional Subcommittee hearing on May 12, during which FDIC executives tried to explain why they had not previously notified Congress of seven breaches over a five-month period that potentially affected 160,000 individuals, members of Congress were so unimpressed with the agency’s response that they suggested FDIC may have lied to or misled the Committee at the hearing and requested revised testimony about the breaches and FDIC’s response to the incidents. Committee members seemed especially skeptical of FDIC’s claims that the employees who took the data acted inadvertently and without malicious intent, particularly in the case of an employee with a background in IT management who copied large amounts of customer records to a portable hard drive before leaving government employment to work in the private sector.

Media reports (and even some of the statements by members of Congress at hearings convened for the purpose of making FDIC executives explain the agency’s actions) focused to a large extent on the FDIC’s decision not to promptly (within seven days) report the incidents to Congress, as required under the Federal Information Security Modernization Act of 2014 (FISMA) and as directed by the Office of Management and Budget (OMB) in its Memorandum M-16-03, issued in October 2015 right around the time that one of the breaches occurred. It’s worth noting that under long-standing federal regulations and OMB guidance FDIC was already obligated to immediately (within one hour of discovery) report the PII breach to the Department of Homeland Security. It should also be pointed out that, despite the relative newness of the OMB guidance, the requirement to report major incidents to Congress within seven days is in the text of the FISMA law (codified at 44 USC §3554) so there is no reason to think that security officials didn’t know that the requirement existed. In testimony to Congress on May 12, FDIC CIO Lawrence Gross noted that FDIC does report all incidents (presumably including the ones that were not reported to Congress) to US-CERT, but Congressional committee members were upset that they were not notified. What should be even more troubling than the poor incident response (including reporting) is the apparently complete inability of the agency to prevent large-scale data exfiltration. The public description of several of the breach events by FDIC officials illustrates very well the difference between detection and prevention. Even though multiple FDIC statements and memos refer to DLP technology (typically taken to mean “data loss prevention” in the industry, although it could be construed to mean protection). Indeed, the FDIC cites its DLP as the mechanism that alerted it to the actions of its employees (copying PII to thumb drives or other removable media). Unfortunately, alerting seems to be all the DLP system did, as the employees were not prevented from copying data to removable media and, in the case of the February breach that received so much attention, FDIC was alerted to the act of copying sensitive data three days after it occurred; in an October incident, the lag was eight days.

The excuse posed by FDIC for not reporting the PII breaches to Congress hinges on the definition of “major,” which, to be fair, was not included in FISMA and was not formally published by OMB until M-16-03. In that Memorandum, OMB laid out a framework rather than an explicit definition, maintaining a level of subjectivity that should have been expected to result in differences of opinion about whether a specific incident is or is not a major incident. The framework includes four factors: information classification; recoverability; mission impact; and exfiltration, modification, deletion, unauthorized access, or lack of availability of 10,000 or more records or affected individuals. OMB’s framework strongly implies that the first three factors must all be present in combination, but not the fourth factor. Each of the breaches experienced by the FDIC involved personally identifiable information (considered a type of controlled unclassified information) of more than 10,000 individuals that was recoverable in a time period greater than eight hours. The only factor not in evidence was impact to a critical service at FDIC. It’s somewhat difficult to understand how anyone at FDIC could arrive at the conclusion that the incidents were not major – the FDIC’s Office of Inspector General reached the opposite (and correct) conclusion. Stranger still is the justification Gross gave for not categorizing these incidents as major:  decisions that the data exfiltration events were inadvertent, non-adversarial, and did not lead to subsequent dissemination to unauthorized parties. These are certainly mitigating factors in determining the risk of harm to individuals affected by the breaches (who were not notified), but they are irrelevant for the purposes of federal incident reporting requirements.

Epic Mossack Fonseca breach tied to basic patch management failures

Coverage of the widely reported disclosure of thousands of documents from law firm Mossack Fonseca has emphasized details the use of legal and financial structures that can and apparently have been used by many high-profile individuals to conceal assets or avoid taxes and the public identification of some of the firm’s more famous or noteworthy clients. Initial media reports characterized the disclosure as a “leak,” at least in part because the firm’s activities came to light when an anonymous individual approached a reporter at German newspaper Suddeutsche Zeitung and offered to provide a large volume of documentation (so large that the work of examining it required the involvement of hundreds of journalists coordinated through the International Consortium of Investigative Journalists). Although the identity of the source who provided this trove of documents has still not been made public, subsequent technical analysis of the situation and of Mossack Fonseca’s website and client portal strongly suggest the data was exfiltrated by an external hack, not by an insider acting as a whistleblower. More astonishing, at least from an information security perspective, is that the hack apparently exploited well-known vulnerabilities in open-source software tools that Mossack Fonseca used. If the results published by security investigators are accurate, then the Mossack Fonseca breach might have been avoided had the firm simply performed routine patching and updates on its website and portal technology.

Mossack Fonseca uses the popular WordPress platform for its website and the open-source Drupal content management system (CMS) for its client portal. Unpatched security vulnerabilities in both toolsets appear to have contributed to the hack, as an out-of-date WordPress plugin may have enabled the compromise of the Mossack Fonseca website and its email system, while exploits of an unpatched Drupal server appear to have left the MF client portal vulnerable to a flaw with a known exploit so critical that Drupal users were warned back in October 2014 to assume that any unpatched servers had been compromised. According to multiple security researchers, including some cited in reporting on the Drupal problem by Forbes, based on server information stored on (and publicly accessible from) the MF portal the site continues to run a version of Drupal rife with exploitable vulnerabilities.

The irony in all this is that law firms globally take client privacy very seriously, holding fast to attorney-client privilege protections in many countries and generally working to keep private transactions and business dealings, well, private. In the face of all the unfavorable press, Mossack Fonseca was quick to claim that many critics fail to understand the nature of its activities on behalf of clients, and even the journalists who worked long and hard to identify MF clients and their offshore holdings have not established that anything Mossack Fonseca did for those clients is actually illegal. More investigations on that front seem to be ongoing, as numerous media outlets reported just today that Mossack Fonseca’s Panama offices had been raided by local authorities, presumably seeking evidence of illegal activities. Cynical observers (and security analysts) might counter that Mossack Fonseca failed to understand even basic information security and privacy principles and lacked the IT management skills or oversight necessary to ensure that they were adequately protecting their own and their clients’ information.