![]() |
![]() |
| Meeting Summary | |||||
Breakout Groups Group A: Analytical Tools Group B: Data Standards and Architecture Session 4: Report Out and Discussion Group A: Analytical tools necessary for proteomic analysis include data collection and federation, data processing, data and information validation, data visualization, and data mining. Tools for data collection and federation include tools to exchange information and data between repositories. These require a minimum set of common information in interoperable repositories, complete with technical and biological annotation. Data processing requires the development and assessment of quality metrics. Data and information validation includes the query of experimental design, annotation, experimental data for reference, and tools to allow comparison of results. Data visualization is crucial at each step of the process. Data mining tools include those to normalize between different techniques. Group B: This group began by identifying the primary users of the repository and their needs. The ideal repository will serve multiple functions, providing raw data for statisticians, LIMS data for consortia members, analytical and searching tools for community users, links to relational databases for biologists, and data to relate plasma findings to tumor findings for cancer researchers. Data challenges include size, format, meaningful modes of presentation, and the relationship between data and changes and updates in relational datasets. The repository will contain experimental data ( e.g., MS, arrays), meta-data ( e.g., search parameters, data to search against), and data on sample preparation and animal handling. Interface needs include the ability to download datasets, a browsable interface, tools to support query and analysis, and links to external databases. Major issues identified by group members included the contrast between statistical methods and manual validation of MS spectra, error rates, sample collection issues ( e.g., platelet activation, proteolysis), standardization to evolving reference databases, and differing experimental practices among collaborating investigators. Group members briefly discussed the relationship between the object model and relational schema, noting that the object model is suitable for a global standard, but relational schema are appropriate for internal implementation. Regarding components that can be standardized, group members listed HUPO PSI standards, such as mzData, mzIdent, and MIAPE. Discussion: Participants began by discussing inputs that will contribute to the project design and identified missing pieces. One participant noted that quality measures are necessary for data submitted to a common repository, regardless of their origin. The field needs guidelines from which to calibrate machines and build standards so that data from the consortia and other systems can be compared. Another participant suggested that raw data should be archived, albeit not necessarily in the central repository. Instead, derived conclusions, supporting evidence, and analyzed data should be stored in the central repository, and mechanisms must be created to update data periodically. Another attendee reflected on the sense of immediacy and urgency to create a public, shared repository, both as a reference tool and a prototype for biomarker studies. caBIG can play an intimate role in this opportunity, and it was observed that this specific group is a major driver of caBIG activity in this particular space. Consortia representatives noted that caBIG is a welcome collaborator in their efforts. caBIG will take the following specific action items to assist with this effort:
Participants also discussed practical considerations for storing large volumes of data, and two strategies to support the storage of terabytes of data were suggested. First, for groups interested in remaining in a federation, it is necessary to devise a way to make virtual, distributed repositories. Second, for archival repositories, the bottleneck occurs as large amounts of data move through the "pipes" that comprise the public infrastructure. Suggested approaches to solve this problem included the pre-positioning of reference datasets, alternative strategies for pre-packaging and shipping ( e.g., overnight shipping of DVDs), and moving the tools to local data sites rather than moving the data to the location of the tools. One attendee noted that the Plasma Proteome Project found that shipping of datasets via DVD worked effectively. Other suggestions for making such a resource useful for human clinical studies included establishing provisions to protect patient health information while retaining the capability to link samples and data to a specific clinical trial and PI. It was recommended to add these provisions at the front end of the design. Dr. Downing noted that NCI is willing to meet with instrument makers to discuss common file format downloads for various mass spectrometers. He noted that the Institute would like to have one representative from this workshop participate in the dialog. He then asked attendees for suggestions on ways that NCI can help to leverage its resources for the consortia and other community-based proteomics efforts. Participants offered the following suggestions:
|
|||||
|