Monday, November 2, 2009

New Hopes towards Greater Research Collaboration through progress in the Worldwide LHC Computing Grid

“Is Internet Looking towards future through the GRIDs having Silver Clouds of Transformation ?”


In the much awaited moments ahead of November 2009, Large Hadron Colider (LHC) in CERN (European Organization for Nuclear Research) was successful in channeling a proton beam in its accelerator after one year of gap in its operations. In parallel were the excited discussions about enormous progress been made about the potential of Worldwide LHC Computing Grid (WLCG) to transform Internet into more accessible, fast medium enabling voluminous data transfer and data sharing for collaborative exchanges of information leading towards greater capabilities in Scientific Cooperation across the borders.


Grid computing revolutionizes the way scientists share and analyses data by enabling researchers to share computer power and data storage over the Internet. Grid projects already help researchers search for new wheat genes, predict storms, or simulate the Sun’s interior. The 7000-odd physicists working on experiments at the Large Hadron Collier will rely entirely on grid computing, specifically on the Worldwide LHC Computing Grid, to connect them with LHC data. The computing centres providing resources for WLCG are embedded in different operational Grid organisations, in particular EGEE (Enabling Grids for E-SciencE) and OSG (the Open Science Grid), but also several national and regional Grid structures such as GridPP in the UK, INFN Grid in Italy and NorduGrid in the Nordic region.


The Enabling Grids for E-sciencE (EGEE) project is funded by the European Commission and aims to integrate current national, regional and thematic Grid efforts, in order to create a seamless Grid infrastructure available to scientists 24 hours-a-day, for the support of scientific research. LCG and EGEE are tightly coupled and provide complementary functions. OSG (Open Science Grid) is a U.S. distributed computing infrastructure for large-scale scientific research, built and operated by a consortium of universities, national laboratories, scientific collaborations and software developers. The OSG integrates computing and storage resources from more than 50 sites in the United States, Asia and South America. The OSG is supported by the U.S. National Science Foundation and Department of Energy's Office of Science.


The Globus Alliance involves several universities and research laboratories conducting research and development to create fundamental Grid technologies and produce open-source software. The WLCG project is actively involved in the support of Globus and uses the Globus-based Virtual Data Toolkit (VDT) as part of the project middleware.


During the development of the LHC Computing Grid, many additional benefits of a distributed system became apparent: (Courtesy-CERN) Multiple copies of data can be kept in different sites, ensuring access for all scientists involved, independent of geographical location. It allows optimum use of spare capacity for multiple computer centres, making it more efficient. Having computer centres in multiple time zones eases round-the-clock monitoring and the availability of expert support. There are no single points of failure. The cost of maintenance and upgrades is distributed, since individual institutes fund local computing resources and retain responsibility for these, while still contributing to the global goal. Independently managed resources have encouraged novel approaches to computing and analysis. So-called “brain drain”, where researchers are forced to leave their country to access resources, is reduced when resources are available from their desktop. The system can be easily reconfigured to face new challenges, making it able to dynamically evolve throughout the life of the LHC, growing in capacity to meet the rising demands as more data is collected each year. It provides considerable flexibility in deciding how and where to provide future computing resources. Also, it allows community to take new advantage of new technologies that may appear and that offer improved usability, cost effectiveness and energy efficiency.


Widely reported news that “The The Grid will revolutionaize the Internet” is clarified by CERN itself. They say: “Grid computing, like the World Wide Web, is an application of the Internet. When the LHC turns on, data will be transferred from CERN to 11 large computing centers around the world at rates of up to 10 gigabits per second. Those large centers will then send and receive data from 200 smaller centers worldwide. All this data transfer will take place over the Internet. Dedicated fibre-optic links are used between CERN and the large centres; the smaller centres connect together through research networks and sometimes the standard public Internet.” (1)


Going ahead to comment on speculation on unprecedented increase in capacity of internet in sharing and downloading capacity, CERN explains: “ First, in order to get such data-transfer rates, individuals would have to do what the large particle physics computing centres have done, and set up (or lease) a dedicated fibre-optic link between their home and the source of their data. Second, today’s grid computing technologies and projects are geared toward research and businesses with highly specific needs, such as vast amounts of data to process and analyse within large, worldwide collaborations. While other computer users may benefit from grid computing through better weather prediction or more effective medications, they may not be logging onto a computing grid anytime soon. (Something called “cloud computing”, where your programs are run in a central location rather than on your own computer, may also be on the horizon.) (ibid)


But scientists and engineers looking ahead towards the Grid for not only enabling sharing of documents and MP3 files, but also connecting PCs with sensors, telescopes and tidal-wave simulators. Though the task of standardizing everything from system templates to the definitions of various resources is a mammoth one, the Global Grid Forum (GGF) can look to the early days of the Web for guidance. The Grid that organizers are building is a new kind of Internet, only this time with the creators having a better knowledge of where the bottlenecks will be. Computers on the grid can also transmit data at lightning speed. This will allow researchers facing heavy processing tasks to call on the assistance of thousands of other computers around the world. The aim is to eliminate the problem experienced by internet users who ask their machine to handle too much information. The real goal of the grid is, however, to work with the LHC in tracking down nature’s most elusive particle, the Higgs boson. Predicted in theory but never yet found, the Higgs is supposed to be what gives matter mass. The latest spin-off from CERN (the particle physics centre that created the web), the grid could also provide the kind of power needed to transmit holographic images; allow instant online gaming with hundreds of thousands of players; and offer high-definition video telephony for the price of a local call.


Research Colloboration
The WLCG project is also following developments in industry, in particular through CERN openlab, where leading IT companies are testing and validating cutting-edge Grid technologies using the LCG environment. The CERN openlab is a collaboration between CERN and industrial partners to study and develop data-intensive solutions to be used by the worldwide community of scientists working at the next-generation Large Hadron Collider. These experiments will generate enormous amounts of data - 15 million gigabytes a year - and will require a globally distributed Grid of over 150 computing centres to store and analyse the data, with a computing capacity of more than 100,000 of today’s cores.




There is no question that scientific research over the past twenty years has undergone a transformation. This transformation has occurred as a result of new technologies leading to new methods of working, have accelerated the pace of discovery and knowledge accumulation not only in the natural sciences but also in the social sciences and arts and humanities. Research today is often critically dependent on computation and data handling. The practice has become known under various terms such as e-Science, e-Research, and cyberscience. Irrespective of the name, many researchers acknowledge that the use of computational methods and data handling is central to their work.



Advances in scientific and other knowledge generated vast amounts of data which need to be managed for analysis, storage and preservation for future re-use. Larger scale science enabled by the Internet, and other information and communication technologies (ICTs), scientific instrumentation and automation of research processes has resulted in the emergence of new research paradigms that are often summarised as 'data-rich science'. A feature of this new kind of research is an unprecedented increase in complexity, in terms of the sophistication of research methods used, in terms of the scale of phenomena considered as well as the granularity of investigation. (2)





e-Research
involves the use of computer-enabled methods to achieve new, better, faster or more efficient research and innovation in any discipline. It draws on developments in computing science, computation, automation and digital communications. Such computer-enabled methods are invaluable within this context of rapid change, accumulation of knowledge and increased collaboration. They can be used by the researcher throughout the research cycle, from research design, data collection, and analysis to the dissemination of results. This is unlike other technological "equipment" which often only proves useful at certain stages of research. Researchers from all disciplines can benefit from the use of e-Research approaches, from the physical sciences to arts and humanities and the social sciences.




e-Research Technologies Supporting Collaboration
e-Research technologies support the research collaborations described above by introducing a model for resource sharing based on the notions of “resources” that are accessed through “services”. Resources can be computational resources such as high-performance computers, storage resources such as storage resource brokers or repositories, datasets held by data archives or even remote instruments such as radio telescopes. In order to make resources available to collaborating researchers, their owners provide services that provide a well-described interface specifying the operations that can be performed on or with a resource, e.g., submitting a compute job or accessing a set of data.




Computer-enabled methods of collaboration for research take many forms, including use of video conferencing, wikis, social networking websites and distributed computing itself. For example, researchers might use Access Grid for video conferencing to hold virtual meetings to discuss their projects. Access Grid and virtual research environments provide simultaneous viewing of participating groups as well as software to allow participants to interact with data on-screen. Wikis have also become a valuable collaborative tool. This is perhaps best demonstrated by the OpenWetWare website, which promotes the sharing of information between researchers working in biology, biomedical research and bioengineering using the concept of a virtual Lab Notebook. This allows researchers to publish research protocols and document experiments. It also provides information about laboratories and research groups around the world as well as courses and events of interest to the community.



Social networking sites have been used or created for research purposes. The myExperiment social website is becoming an indispensible collaboration tool for sharing scientific workflows and building communities. Such sharing cuts down on the repetition of research work, saving time and effort and leading to advances and innovation more rapidly than if researchers were on their own, without access to similar work (for comparison to their own). Other social networking sites such as Facebook have been adopted by researchers and extensions have been built to allow them to be used as portal to access research information. For example, content in the ICEAGE Digital Library can be accessed within Facebook. (ibid)




The Role of Databases for LHC Data Processing and Analysis:
Database services are required by the experiments’ online systems, for most if not all aspects of offline processing, for simulation activities as well as analysis. Some specific examples include for the PVSS Supervisory Control and Data Acquisition (SCADA) system, for detector conditions (e.g. COOL), alignment and geometry applications, for Grid Data Management (LCG File Catalog, File Transfer Service) and Storage Management (e.g. CASTOR + SRM) services, as well as Grid infrastructure and operations tools (GridView, SAM, Dashboards, VOMS). (3)



In late spring 2007 four high-energy physics (HEP) laboratories, The European Organization for Nuclear Research (CERN), the Deutsches Elektronen Synchrotron (DESY), the Fermi National Accelerator Laboratory (FNAL) and the Stanford Linear Accelerator Center (SLAC), ran a user poll to analyze the current state of HEP information systems. The goal was to achieve a better understanding of the perceptions, behaviors and wishes of the end users of these information systems. The poll received more than 2100 answers, representing about 10% of the active HEP community worldwide. The poll showed that community-based services dominate this field of research with the metadata- only search engine SPIRES-HEP [1] being the primary information gateway for most scholars. Users also gave their preferences regarding existing functionalities like access to full text and to citation information, and a list of features that they would like to have in the coming years. The results showed that the scholars attach paramount importance to three axes of excellence: access to full text, depth of coverage and quality of content. (4)



Real challenging question being faced by LHC-CERN is “Can concept of cloud computing replace that of grids?”(5) Worldwide LHC Computing Grid (LCG)– has been established, building on two main production infrastructures: those of the Open Science Grid (OSG) in the Americas, and the Enabling Grids for E-sciencE (EGEE) Grid in Europe and elsewhere. The machine itself – the Large Hadron Collider (LHC) – is situated some 100m underground beneath the French-Swiss border near Geneva, Switzerland and supports four major collaborations and their associated detectors: ATLAS, CMS, ALICE and LHCb.




Running a service where the user expectation is for support 24x7, with extremely rapid problem determination and resolution targets, is already a challenge. When this is extended to a large number of rather loosely coupled sites, the majority of which support multiple disciplines – often with conflicting requirements but always with local constraints – this becomes a major or even “grand” challenge. That this model works at the scale required by the LHC experiments –literally around the world and around the clock – is a valuable In order to process and analyze the data from the world's largest scientific machine, a worldwide grid service – the vindication of the Grid computing paradigm.




Currently, adapting an existing application to the Grid environment is a non-trivial exercise that requires an in-depth understanding not only of the Grid computing paradigm but also of the computing model of the application in question. The successful demonstration of a straightforward recipe for moving a wide range of applications – from simple to the most demanding – to Cloud environments would be a significant boost for this technology and could open the door to truly ubiquitous computing. (ibid)

-------------------------------------------------------------------------------------------------------------------------------

References:

1) ttp://public.web.cern.ch/Public/en/Spotlight/SpotlightGridFactsAndFiction-en.html

2) Voss, A., & Vander Meer, E. (2009, September 7). Research in a Connected World. Retrieved from the

Connexions Web site: http://cnx.org/content/m20834/1.3/ )

3) Maria Girone, Distributed Database Services – a Fundamental Component of the WLCG Service for the LHC Experiments – Experience and Outlook, CERN Document Server, European Organization for Nuclear Research, (Email: Maria.Girone@cern.ch

4) R Ivanov and L Raae, INSPIRE: a new scientific information system for HEP, CERN Document Server, European Organization for Nuclear Research (E-mail: Radoslav.Ivanov@cern.ch, Lars.Christian.Raae@cern.ch)

5) J.D. Shiers, Can Clouds Replace Grids? A Real-Life Exabyte-Scale Test-Case, CERN Document Server, European Organisation for Nuclear Research (CERN) (e-mail: Jamie.Shiers@cern.ch)

------------------------------------------------------------------------------------------------------------------------------------