You are in:
Sharing and storing data
As part of our Data Management series, Acopia Networks explains to the BAPCO Journal about File Area Networking (FAN) and wby it is described as the next generation of information storage and distribution…
With recent studies suggesting that file data growth and files management have become top IT priorities, organisations are starting to look closely at their approach to storing and accessing data, especially in light of incessant storage demand that shows no sign of abating. What they are finding is a disturbing correlation: increasing storage demand is driving organisations to increasingly complex and costly storage infrastructure. Today’s seemingly proven solutions, more and bigger SANs, only serve to increase the cost and complexity without effectively satisfying the demand.
Not all storage demand, however, requires the same storage infrastructure. A distinct shift is underway in the way we use, store, interact with, and process information, these changes have important implications for the way enterprises will manage information assets in the future, and should be investigated and understood by IT strategists, planners, and implementers alike.
Files vs. Blocks – opportunity or challenge?
A decade ago, IT focused on improving the performance and reducing the cost of functionally isolated transaction-processing applications built on structured databases. These large databases were stored and protected in centralised data centres. Interaction to the data was typically through static application interfaces such as ATMs and other fixed menu GUI’s. Data growth was rapid.
As more data poured into these databases, they became unwieldy. Backups become challenging, management costs rose exponentially. Capacity purchased for one application or database might go unused while another database outgrew its bounds. Billions of dollars were invested in storage area networks in an attempt to virtualise connectivity between application servers and storage, billions more in software and hardware to make copies of the data for protection, and reduce management costs. These investments were not wasted; they helped to keep the IT ship afloat in turbulent waters.
Unfortunately, while we were addressing the problem of managing structured, raw data, the world of information was changing.
Over the last 5 years, the complexity of web-based applications has increased dramatically. With the advent of the next-generation applications inferred by Web 2.0 and SOA, we now demand information to be dynamically configured and presented to us in real-time to meet our specific interest. Multi-media information flow is now expected rather than exceptional. Even ATM menus have become dynamic and personalised. Applications are no longer isolated-function systems. Interaction with information is now a broad set of services available to us on demand.
These changes have led to a dramatic increase in file based data – data storage capacity is growing at better than 50% per year according to IDC, with 85% of that growth in unstructured or semi-structured files. The shift in storage demand from block-based data to file-based data, a shift leading industry analysts see as only accelerating and becoming more pronounced going forward opens up new possibilities that extend far beyond storage as usual.
These new possibilities give organisations for the first time a realistic hope that they can accommodate growth in storage demand without a corresponding increase in storage complexity and cost.
The new possibilities hinge on the fact that file-based data, accessed at the file level through file systems can be managed by intelligent systems. Files have inherent advantages in managing information because in effect they are canisters of data with specific characteristics. It has been said that files are the smallest element of data that can have business context applied to it.
File characteristics like age, ownership, size, and name can be externally observed, and we can apply policies to those characteristics e.g., move all files older than 365 days to a low-cost SATA array. Files enable the bridging of the worlds of applications and information by making information resources transparent to the application using services.
A dizzying array of alternatives
These new styles of computing and uses of information also infer that data can no longer be centrally located. Limited remote access through static GUI’s is outdated – in fact, quite the opposite is occurring. Interfaces are now rich, with dynamic local, memory-based, data manipulation. Data now must reside where it makes the most sense at any given moment and under any given set of conditions and policies. Data must reside on the network, and we as end-users will neither know nor care where it is physically stored or how it is logically accessed.
In response to this rapid shift toward file-based data, a bewildering set of new technologies has emerged – virtualisation of all sorts and descriptions, WAFS, WAN optimisation, distributed file systems, Indexing and Classification, Lifecycle automation, and more. Each of these technologies has value, but together the lack of common interfaces and integration creates a virtual tower of Babel.
Most of these new technologies arise out of the broad industry consensus, that the solution to the problem is to centralise management of information without centralising its location – approaches often referred to as centralised control/distributed access. In order to achieve a coherent file management strategy, a new unified architectural and methodological approach is required.
The past as a guide
In the last decade, we addressed storage management challenges such as difficulty sharing resources, increasing complexity and cost, and disruptive management procedures by virtualising the connectivity layer between application servers and storage with Storage Area Networks. However, the SAN itself became complex, and managing data again became problematic. Intelligence in the form of volume managers made it easier to provision storage capacity from one server to another, and helped control complexity.
Utilisation rates increased, costs stabilised but never actually declined. The reasons in hindsight are now clearer. Intelligence in the form of automation and tools in the SAN can only be applied to the context of the SAN. SANs are by nature local, and block-based, so the context for intelligence in a SAN is limited to blocks. Because blocks do not contain business context, our ability to apply interesting and useful policy automation is limited
Today, in our new context of file data management, the management challenges are familiar and similar to those facing IT in the early days of SAN.
Users are often tightly bound to homogeneous islands of storage, creating capacity utilisation issues, management complexity, disruptive procedures, and high costs.
However, there are distinct and important differences now. Files have business context, they offer us a broad and rich set of options for policy automation. Blocks do not.
Rather than the fairly unruly and rudimentary SCSI protocol used for block data, files are accessed over networks today primarily through two common and widely used and standardised file system protocols, NFS and CIFS. Finally, the transport is Internet Protocol (IP) rather than FibreChannel (FC). IP is inherently distributed and global where FC is a LAN technology with limited range and scope.
The evolution of File Area Networking (FAN)
Fortunately, these important technological advances offer us the basis for organising services, networks, information, and access – the File Area Network or FAN.
Organisations are adopting the concept of FAN as a non-disruptive complement to the existing storage infrastructure. FANs allow them to massively scale and centrally manage their file-based storage without the corresponding increase in cost and complexity associated with the SAN.
The FAN, coexists easily with SANs, handling data type (files) and metadata that are not part of the SAN design. Thus, the FAN becomes an essential enabler for information lifecycle management (ILM), enterprise content management (ECM), content-addressable storage (CAS). Through the FAN, organisations can implement policy-based services representing a wide variety of functionality and control, e.g. migration, replication, load balancing, tiering, classification, data placement, access control, de-duplication, retention, and more.
To understand FAN’s impact, a basic understanding of key aspects of this new file-based information service paradigm is required:
Metadata – consists of information about the file-based data and its usage. Files make it possible to convey information about the data as well as the data itself. With metadata attached to a file, intelligent systems can identify and manage the file, based on business values, such as age of the data, frequency of use, and ownership of the data.
File Virtualisation – by masking the underlying complexity of connecting a user to the specific location of the file, virtualisation makes it possible to move, access, and manage files without regard to physical storage.
Specific to file virtualisation is the concept of a global unified namespace, which provides the ability to organise, present, and store file-based data.
Global Unified Namespace – is a logical abstraction (virtualisation) of the underlying physical file infrastructure. A Global Unified Namespace provides a single access point into the global file storage infrastructure; it is heterogeneous (supporting disparate physical file systems and platforms) in nature and global in scope.
It is important to point out that the Global Unified Namespace preserves the existing physical file systems it virtualises, but enables them to be accessed as though they were a single shared entity.
Real-time Policy Enforcement – while virtualisation is an essential component of any file network, virtualisation is not an end in itself. It is important that the fabric provide an enforcement point for a range of advanced file level controls.
File System Routing – by its very nature the file network is distributed, thus the fabric acts as a router of enterprise-wide file information, directing file requests to the appropriate resources, irrespective of location.
The future of FAN
In 2006, the focus of File Area Networking has been “cleaning up the mess” – simplifying management tasks and burgeoning infrastructure.
The FAN introduces simple to use solutions for basic file management tasks such as data movement and placement.
In 2007, the focus shifts to enhancing the services delivered by in-band, real-time policy enforcement engines – adding policies for specific markets and technologies (legal, DHS, medical, architectural, etc). Scale, scope, and flexibility expand the FAN infrastructure support – adding more supported platforms, extending the interconnects and protocols, and allowing for a more distributed policy enforcement domain.
By 2008, FANs add the ability to specific enhance various elements of the infrastructure by application – tuning elements by application. We also begin to see the network as the tool for data protection services. The infrastructure moves from distributed data centres to much more broadly deployed compute and data elements and FANs optimise the connectivity and relationships.Ultimately, the FAN’s intelligent services will bridge the worlds of applications and file resources, bringing about long awaited possibilities such as:
Massive global scalability;
Transparent and secure global access to data
Efficient, centralised global data and storage management; and
Seamless, fully meshed FAN/SAN infrastructure.
