Metadata Model¶
ENA Metadata Model¶
The object schemas that are used for rendering the forms and validating the information submitted to the application are based on the ENA (European Nucleotide Archive) Metadata Model.
The source XML schemas are from ENA Sequence Github repository. The XML schemas are converted to JSON Schemas so that they can be both validate the submitted data as well as be rendered as forms in the User Interface. For this reason the translation from XML Schema to JSON schema is not a 1-1 mapping, but an interpretation.
The ENA model consists of the following objects:
Study
: A study groups together data submitted to the archive. A study accession is typically used when citing data submitted to ENA. Note that all associated data and other objects are made public when the study is released.Project
: A project groups together data submitted to the archive. A project accession is typically used when citing data submitted to ENA. Note that all associated data and other objects are made public when the project is released.Sample
: A sample contains information about the sequenced source material. Samples are typically associated with checklists, which define the fields used to annotate the samples.Experiment
: An experiment contain information about a sequencing experiment including library and instrument details.Run
: A run is part of an experiment and refers to data files containing sequence reads.Analysis
: An analysis contains secondary analysis results derived from sequence reads (e.g. a genome assembly).DAC
: An European Genome-phenome Archive (EGA) data access committee (DAC) is required for authorized access submissions.Policy
: An European Genome-phenome Archive (EGA) data access policy is required for authorized access submissions.Dataset
: An European Genome-phenome Archive (EGA) data set is required for authorized access submissions.
Relationships between objects¶
Each of the objects are connected between each other by references, usually in the form of an accessionId
.
Some of the relationships are illustrated in the Metadata ENA Model figure, however in more detail they are connected as follows:
Study
- usually other objects point to it, as it represents one of the main objects of aSubmission
;Analysis
- contains references to:- parent
Study
(not mandatory); - zero or more references to objects of type:
Sample
,Experiment
,Run
;
- parent
Experiment
- contains references to exactly one parentStudy
. It can also contain a reference toSample
as an individual or a Pool;Run
- contains reference to exactly one parentExperiment
;Policy
- contains reference to exactly one parentDAC
;Dataset
- contains references to:- exactly one
Policy
; - zero or more references to objects of type:
Analysis
andRun
.
- exactly one
EGA/ENA Metadata submission Guides¶
Related guides for metadata submission:
EGA Metadata guides:
ENA Data Submission general Guide