2019-2021 Data Mobility Workshop & Exhibition
How long does a One Terabyte (1TB) research dataset download take to complete on your campus? A lunch break? An entire workday? What if, when we still went to offices, you clicked download, walked down to the water cooler, had a chat with a co-worker, then upon return, noticed the download finished in around 20 minutes? This should be your reality with a 10G Data Transfer Node (DTN), Science DMZ, and parallel transfer applications. Unfortunately, this is not the case on many campuses and researchers download this dataset to their desktops, which can take most of the work day. And some, try this download on wifi which will take 24-48 hours. A campus goal of 1 hour for a 1TB data set download is a great place to start, but how do you get started to verify this?
The Data Mobility Exhibition (DME) was established to measure and understand data transfer baselines for performance expectations, increase knowledge of data transfer hardware and software, and share advanced design patterns of portal software as a benefit for scientific communities. The DME uses reference data sets, and existing campus CI components, to download, measure, and potentially improve their scientific data movement capabilities. This is a multi-year experiment where we hope to to understand and improve the research community's collective ability to address a critical problem in scientific technology use: transferring data sets using networks at consistent high performance speeds. Over the past year, there have been over 600 transfer tests completed. Is your campus next? This is an excellent opportunity with known performant endpoints to test your DTNs, Science DMZ, and CC* infrastructure.
Efforts to fund basic scientific technology infrastructure (e.g. Science DMZ network components, data transfer hardware and software, etc.) have fallen short as the community enters a critical junction: deep integration with user communities to bridge the gap between the technology implementation and scientific use cases. Hardware itself cannot overcome the usability gap, and research groups are struggling without assistance to adopt and use available technology.
EPOC partners ESnet and Indiana University, along with the Globus project at the University of Chicago, are convening a workshop & data exhibition to address this gap in implementation and understanding. This one-day event, split into two halves for different audiences, aims to:
- Facilitate an exhibition to establish data transfer baselines for performance expectations throughput the R&E community;
- Explain the base set of knowledge needed to implement and integrate data transfer hardware and software for scientific users;
- Offer a more advanced view of mechanisms to create portal software that can be implemented by technology operators as a benefit for scientific communities.
Designing, implementing, and supporting advanced Cyberinfrastructure (CI) technology must not be done in a vacuum. A critical component is interaction with end users to better understand the base needs and expectations of the science that underlies the CI technology use. Through this it is possible to better construct and operate advanced services including those that implement data movement.
This three-part community workshop event will culminate at the NSF CC and CICI PI meeting in September of 2019 and includes:
- A data movement exhibition. Community members, and not just those participating in the workshop components, are encouraged to participate in an exciting and beneficial data movement exhibition. Using reference data sets, and existing campus CI components, data exhibition participants will download, measure, and potentially improve their scientific data movement capabilities. EPOC is available to assist in this activity, and will provide instructions, record results, and offer “Roadside Assistance” for any group that desires to learn and improve their data movement expectations and performance.
- An introductory workshop. The basics of CI support are still in the nascent stages of implementation and operation for a large population of funded CC* awardees. This workshop will offer the basic knowledge to get a Science DMZ, data transfer components, and network monitoring up and running. Attendees will work with community leaders to better understand, build, and support CI to support scientific use cases.
- An advanced workshop. Concepts will build upon those introduced in the introductory workshop material to fully integrate the concepts of the Science DMZ and data transfer into functional scientific infrastructure to benefit campus users. Attendees will learn, and create, the components of a modern research data portal (MRDP) – a set of software components that operate on top of capable CI hardware.
Who Should Participate?
- CC* Workshop
- Intro Section
- Facility Leadership
- Technology Professionals
- Intermediate Section
- Technology Professionals that will implement some aspect of a MRDP
- 2019-2021 Data Exhibition
- Facility Leadership
- Technology Professionals
- Scientific Users
At the end of this event we expect the following outcomes for the various stakeholders:
- Facility Leadership: Better understanding of the strategic value of advanced CI for a campus environment, and the performance expectations that one can expect for the scientific users that will utilize the resource.
- Technology Professionals: Core set of knowledge that will enable the design, construction, and long-term maintenance of CI for the campus environment. New partnerships with campus scientific users.
- Scientific Users: Strategic resources that can be used to accelerate scientific workflows. New partnerships with campus technology groups.
- Funding Agencies: Snapshot of the performance expectations and realities for a subset of the national CI environment (technology use/availability and user impacts).
- Network Operators: Firm understanding of the type of technology use cases and user groups that will take advantage of campus and regional CI investments.
- A introductory webinar was held on Friday August 30th 2019 at 2pm ET to discuss these events in more detail. Is is archived and available for viewing here: https://www.youtube.com/watch?v=e_siF0zBgao
- A project update we held on Friday April 3rd 2020 to give an update. Is is archived and available for viewing here: https://www.youtube.com/watch?v=VEBP76YyDAY
- A project update was held on Friday August 14th 2020 to give an update. It is archived, and available for viewing here: https://youtu.be/CmHGu9cG0ww