Validate & register scRNA-seq datasets#
Single-cell RNA-seq (scRNA-seq) measures gene expression of individual cells and generates datasets that are often used to define cell states that associated with functional phenotypes. Data formats, such as AnnData and SingleCellExperiment objects help storing metadata and data as an entity. However, non-validated metadata are often stored which made it hard to integrate with other datasets.
In this notebook, we show how Lamin can help with manage scRNA-seq data.
!lamin init --storage ./test-scrna --schema bionty
Show code cell output
π‘ creating schemas: core==0.46.1 bionty==0.30.0
β
saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-08-28 13:50:39)
β
saved: Storage(id='7gYw68gC', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-08-28 13:50:39, created_by_id='DzTjkKse')
β
loaded instance: testuser1/test-scrna
π‘ did not register local instance on hub (if you want, call `lamin register`)
import lamindb as ln
import lnschema_bionty as lb
β
loaded instance: testuser1/test-scrna (lamindb 0.51.0)
ln.track()
π‘ notebook imports: lamindb==0.51.0 lnschema_bionty==0.30.0
β
saved: Transform(id='Nv48yAceNSh8z8', name='Validate & register scRNA-seq datasets', short_name='scrna', version='0', type=notebook, updated_at=2023-08-28 13:50:41, created_by_id='DzTjkKse')
β
saved: Run(id='ujzl8FtsURX7meXQWLrn', run_at=2023-08-28 13:50:41, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
Human immune cells: Conde22#
lb.settings.species = "human"
β
set species: Species(id='uHJU', name='human', taxon_id=9606, scientific_name='homo_sapiens', updated_at=2023-08-28 13:50:42, bionty_source_id='0znn', created_by_id='DzTjkKse')
Transform #
(Here we skip steps of data transformations, which often includes filtering, normalizing, or formatting data.)
Letβs look at a scRNA-seq count matrix in form of an AnnData object:
adata = ln.dev.datasets.anndata_human_immune_cells(
populate_registries=True # pre-populate registries to simulate an used instance
)
Show code cell output
adata
AnnData object with n_obs Γ n_vars = 1648 Γ 36503
obs: 'donor', 'tissue', 'cell_type', 'assay'
var: 'feature_is_filtered', 'feature_reference', 'feature_biotype'
uns: 'cell_type_ontology_term_id_colors', 'default_embedding', 'schema_version', 'title'
obsm: 'X_umap'
Validate #
Validate genes in .var
#
lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
π‘ using global setting species = human
β
36355 terms (99.60%) are validated for ensembl_gene_id
β 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
Weβre seeing that 148 gene identifiers canβt be validated (not currently in the Gene registry). Weβd like to validate all features in this dataset, hence, letβs inspect them to see what to do:
inspect_result = lb.Gene.inspect(adata.var.index, lb.Gene.ensembl_gene_id)
Show code cell output
π‘ using global setting species = human
β
36355 terms (99.60%) are validated for ensembl_gene_id
β 148 terms (0.40%) are not validated for ensembl_gene_id: ENSG00000269933, ENSG00000261737, ENSG00000259834, ENSG00000256374, ENSG00000263464, ENSG00000203812, ENSG00000272196, ENSG00000272880, ENSG00000270188, ENSG00000287116, ENSG00000237133, ENSG00000224739, ENSG00000227902, ENSG00000239467, ENSG00000272551, ENSG00000280374, ENSG00000236886, ENSG00000229352, ENSG00000286601, ENSG00000227021, ...
π‘ using global setting species = human
π‘ detected 35 terms in Bionty for ensembl_gene_id: ENSG00000276256, ENSG00000277856, ENSG00000274847, ENSG00000198712, ENSG00000274175, ENSG00000278384, ENSG00000198899, ENSG00000271254, ENSG00000273554, ENSG00000277196, ENSG00000198840, ENSG00000276760, ENSG00000273748, ENSG00000276017, ENSG00000198727, ENSG00000276345, ENSG00000275249, ENSG00000198786, ENSG00000278817, ENSG00000277630, ...
π‘ β add records from Bionty to your registry via .from_values()
π‘ couldn't validate 113 terms: ENSG00000273370, ENSG00000227021, ENSG00000270394, ENSG00000287388, ENSG00000286996, ENSG00000259444, ENSG00000227902, ENSG00000233776, ENSG00000276814, ENSG00000272551, ENSG00000278782, ENSG00000273888, ENSG00000285106, ENSG00000263464, ENSG00000286228, ENSG00000256374, ENSG00000226403, ENSG00000272880, ENSG00000271870, ENSG00000285162, ...
π‘ β if you are sure, add records to your registry via .from_values()
Inspect logging says 35 of the non-validated ensembl_gene_ids can be found in Bionty reference. Letβs register them:
records_bionty = lb.Gene.from_values(
inspect_result.non_validated, lb.Gene.ensembl_gene_id
)
ln.save(records_bionty)
Show code cell output
π‘ using global setting species = human
β
created 35 Gene records from Bionty matching ensembl_gene_id: ENSG00000198804, ENSG00000198712, ENSG00000228253, ENSG00000198899, ENSG00000198938, ENSG00000198840, ENSG00000212907, ENSG00000198886, ENSG00000198786, ENSG00000198695, ENSG00000198727, ENSG00000278704, ENSG00000277400, ENSG00000274847, ENSG00000276256, ENSG00000277630, ENSG00000278384, ENSG00000273748, ENSG00000271254, ENSG00000277475, ...
β did not create Gene records for 113 non-validated ensembl_gene_ids: ENSG00000112096, ENSG00000182230, ENSG00000203812, ENSG00000204092, ENSG00000215271, ENSG00000221995, ENSG00000224739, ENSG00000224745, ENSG00000225932, ENSG00000226377, ENSG00000226380, ENSG00000226403, ENSG00000227021, ENSG00000227220, ENSG00000227902, ENSG00000228139, ENSG00000228906, ENSG00000229352, ENSG00000231575, ENSG00000232196, ...
The rest 113 arenβt present in the current Ensembl assembly (e.g. ENSG00000112096).
Weβd still like to register them, so letβs create Gene records with those ensembl_gene_ids:
validated = lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id, mute=True)
nonval_ensembl_ids = adata.var.index[~validated]
new_records = [
lb.Gene(ensembl_gene_id=ens_id, species=lb.settings.species)
for ens_id in nonval_ensembl_ids
]
ln.save(new_records)
Show code cell output
π‘ using global setting species = human
Now all genes pass validation:
lb.Gene.validate(adata.var.index, lb.Gene.ensembl_gene_id);
π‘ using global setting species = human
β
36503 terms (100.00%) are validated for ensembl_gene_id
Validate metadata in .obs
#
adata.obs.columns
Index(['donor', 'tissue', 'cell_type', 'assay'], dtype='object')
1 feature is not validated: donor
validated = ln.Feature.validate(adata.obs.columns)
β
3 terms (75.00%) are validated for name
β 1 term (25.00%) is not validated for name: donor
Letβs register it:
features = ln.Feature.from_df(adata.obs)
ln.save(features)
All metadata columns are now validated as feature:
ln.Feature.validate(adata.obs.columns);
β
4 terms (100.00%) are validated for name
Next, letβs validate the corresponding labels of each feature:
Some of the metadata labels can be typed using dedicated registries: (e.g. bionty offers ontology-based registries for biological entities)
validated = lb.CellType.validate(adata.obs.cell_type)
β received 32 unique terms, 1616 empty/duplicated terms are ignored
β
30 terms (93.80%) are validated for name
β 2 terms (6.20%) are not validated for name: germinal center B cell, megakaryocyte
Register non-validated cell types from Bionty:
nonval_cell_type_records = lb.CellType.from_values(
adata.obs.cell_type[~validated], "name"
)
ln.save(nonval_cell_type_records)
Show code cell output
β
created 2 CellType records from Bionty matching name: germinal center B cell, megakaryocyte
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='UrtDirMx', name='megakaryocyte', ontology_id='CL:0000556', synonyms='megalocaryocyte|megalokaryocyte|megacaryocyte', description='A Large Hematopoietic Cell (50 To 100 Micron) With A Lobated Nucleus. Once Mature, This Cell Undergoes Multiple Rounds Of Endomitosis And Cytoplasmic Restructuring To Allow Platelet Formation And Release.', updated_at=2023-08-28 13:51:08, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000763
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='g1zY6vUW', name='myeloid cell', ontology_id='CL:0000763', description='A Cell Of The Monocyte, Granulocyte, Mast Cell, Megakaryocyte, Or Erythroid Lineage.', updated_at=2023-08-28 13:51:08, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000988
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='Q0aQr5JB', name='hematopoietic cell', ontology_id='CL:0000988', synonyms='haematopoietic cell|hemopoietic cell|haemopoietic cell', description='A Cell Of A Hematopoietic Lineage.', updated_at=2023-08-28 13:51:09, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
loaded 1 CellType record matching ontology_id: CL:0000548
β
created 1 CellType record from Bionty matching ontology_id: CL:0002371
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='QMAH6IlS', name='somatic cell', ontology_id='CL:0002371', description='A Cell Of An Organism That Does Not Pass On Its Genetic Material To The Organism'S Offspring (I.E. A Non-Germ Line Cell).', updated_at=2023-08-28 13:51:09, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
loaded 1 CellType record matching ontology_id: CL:0000548
β
created 1 CellType record from Bionty matching ontology_id: CL:0000003
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='VT73gpK2', name='native cell', ontology_id='CL:0000003', description='A Cell That Is Found In A Natural Setting, Which Includes Multicellular Organism Cells 'In Vivo' (I.E. Part Of An Organism), And Unicellular Organisms 'In Environment' (I.E. Part Of A Natural Environment).', updated_at=2023-08-28 13:51:10, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000000
π‘ also saving parents of CellType(id='uMLhrmbZ', name='germinal center B cell', ontology_id='CL:0000844', synonyms='GC B-cell|GC B cell|GC B lymphocyte|germinal center B lymphocyte|GC B-lymphocyte|germinal center B-cell|germinal center B-lymphocyte', description='A Rapidly Cycling Mature B Cell That Has Distinct Phenotypic Characteristics And Is Involved In T-Dependent Immune Responses And Located Typically In The Germinal Centers Of Lymph Nodes. This Cell Type Expresses Ly77 After Activation.', updated_at=2023-08-28 13:51:08, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000785
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='0I51jgPp', name='mature B cell', ontology_id='CL:0000785', synonyms='mature B lymphocyte|mature B-cell|mature B-lymphocyte', description='A B Cell That Is Mature, Having Left The Bone Marrow. Initially, These Cells Are Igm-Positive And Igd-Positive, And They Can Be Activated By Antigen.', updated_at=2023-08-28 13:51:11, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0001201
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='CIS4VJI0', name='B cell, CD19-positive', ontology_id='CL:0001201', synonyms='CD19+ B cell|B lymphocyte, CD19-positive|B-lymphocyte, CD19-positive|CD19-positive B cell|B-cell, CD19-positive', description='A B Cell That Is Cd19-Positive.', updated_at=2023-08-28 13:51:12, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000236
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='cx8VcggA', name='B cell', ontology_id='CL:0000236', synonyms='B-cell|B lymphocyte|B-lymphocyte', description='A Lymphocyte Of B Lineage That Is Capable Of B Cell Mediated Immunity.', updated_at=2023-08-28 13:51:12, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000945
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='Z0yFV7vU', name='lymphocyte of B lineage', ontology_id='CL:0000945', description='A Lymphocyte Of B Lineage With The Commitment To Express An Immunoglobulin Complex.', updated_at=2023-08-28 13:51:13, bionty_source_id='glQH', created_by_id='DzTjkKse')
lb.ExperimentalFactor.validate(adata.obs.assay)
lb.Tissue.validate(adata.obs.tissue);
β
3 terms (100.00%) are validated for name
β
17 terms (100.00%) are validated for name
Metadata that canβt be typed with dedicated registries (in this example, we didnβt mount a custom schema that contains a Donor registry), we can use the Label
registry to track donor ids.
ln.Label.validate(adata.obs["donor"]);
β received 12 unique terms, 1636 empty/duplicated terms are ignored
β 12 terms (100.00%) are not validated for name: D496, 621B, A29, A36, A35, 637C, A52, A37, D503, 640C, A31, 582C
Donor labels are not validated, so letβs register them:
donors = [ln.Label(name=name) for name in adata.obs["donor"].unique()]
ln.save(donors)
ln.Label.validate(adata.obs["donor"]);
β
12 terms (100.00%) are validated for name
Validate external metadata#
In addition to whatβs already in the file, weβd like to link this file with external features including βspeciesβ and βassayβ:
ln.Feature.validate("species")
ln.Feature.validate("assay");
β
1 term (100.00%) is validated for name
β
1 term (100.00%) is validated for name
Validate corresponding labels of these features:
Sometimes we donβt remember what the term is called exactly, search can help:
lb.ExperimentalFactor.search("scRNA-seq").head(2)
id | synonyms | __ratio__ | |
---|---|---|---|
name | |||
single-cell RNA sequencing | 068T1Df6 | single-cell RNA-seq|scRNA-seq|single cell RNA ... | 100.000000 |
10x 3' v3 | Vep0itYq | 10X 3' v3 | 11.111111 |
scrna = lb.ExperimentalFactor.filter(id="068T1Df6").one()
Register #
Register data#
When we create a File object from an AnnData, weβll automatically link its feature sets and get information about unmapped categories:
file = ln.File.from_anndata(
adata, description="Conde22", var_ref=lb.Gene.ensembl_gene_id
)
Show code cell output
π‘ file will be copied to default storage upon `save()` with key `None` ('.lamindb/L7srPtuIfV1AWTBQTWYo.h5ad')
π‘ parsing feature names of X stored in slot 'var'
π‘ using global setting species = human
β
36503 terms (100.00%) are validated for ensembl_gene_id
π‘ using global setting species = human
β
linked: FeatureSet(id='y9MJ7mJXSm2HTosdC2Be', n=36503, type='float', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', created_by_id='DzTjkKse')
π‘ parsing feature names of slot 'obs'
β
4 terms (100.00%) are validated for name
β
linked: FeatureSet(id='1nrhQPHvB4xJwzZvWss2', n=4, registry='core.Feature', hash='_MH_53cOfZiBWqtyFpYy', modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
file.save()
β
saved 2 feature sets for slots: 'var','obs'
β
storing file 'L7srPtuIfV1AWTBQTWYo' at '.lamindb/L7srPtuIfV1AWTBQTWYo.h5ad'
The file has the following 2 linked feature sets:
file.features
'var': FeatureSet(id='y9MJ7mJXSm2HTosdC2Be', n=36503, type='float', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-08-28 13:51:16, created_by_id='DzTjkKse')
'obs': FeatureSet(id='1nrhQPHvB4xJwzZvWss2', n=4, registry='core.Feature', hash='_MH_53cOfZiBWqtyFpYy', updated_at=2023-08-28 13:51:20, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
You can further annotate your feature set with modality:
var_feature_set = file.features.get_feature_set("var")
modalities = ln.Modality.lookup()
var_feature_set.modality = modalities.rna
var_feature_set.save()
Link metadata#
Letβs now link observational metadata by adding labels to corresponding features.
cell_types = lb.CellType.from_values(adata.obs.cell_type, field="name")
efs = lb.ExperimentalFactor.from_values(adata.obs.assay, field="name")
tissues = lb.Tissue.from_values(adata.obs.tissue, field="name")
donors = ln.Label.from_values(adata.obs["donor"])
file.add_labels(cell_types, "cell_type")
file.add_labels(efs, "assay")
file.add_labels(tissues, "tissue")
file.add_labels(donors, feature="donor")
β
linked feature 'cell_type' to registry 'bionty.CellType'
β
linked feature 'assay' to registry 'bionty.ExperimentalFactor'
β
linked feature 'tissue' to registry 'bionty.Tissue'
β
linked feature 'donor' to registry 'core.Label'
file.features
'var': FeatureSet(id='y9MJ7mJXSm2HTosdC2Be', n=36503, type='float', registry='bionty.Gene', hash='dnRexHCtxtmOU81_EpoJ', updated_at=2023-08-28 13:51:20, modality_id='zmqL7br5', created_by_id='DzTjkKse')
'obs': FeatureSet(id='1nrhQPHvB4xJwzZvWss2', n=4, registry='core.Feature', hash='_MH_53cOfZiBWqtyFpYy', updated_at=2023-08-28 13:51:20, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
Note that adding labels to an external feature will create an external feature set.
file.add_labels(lb.settings.species, feature="species")
file.add_labels(scrna, feature="assay")
β
linked feature 'species' to registry 'bionty.Species'
β
linked new feature 'species' together with new feature set FeatureSet(id='U7p2tI8mncwoylRFnpdy', n=1, registry='core.Feature', hash='xXZsHT031KA0uzI9zDhB', updated_at=2023-08-28 13:51:20, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
The file is now queryable by everything we linked:
file.describe()
π‘ File(id='L7srPtuIfV1AWTBQTWYo', key=None, suffix='.h5ad', accessor='AnnData', description='Conde22', version=None, size=28049505, hash='WEFcMZxJNmMiUOFrcSTaig', hash_type='md5', created_at=2023-08-28 13:51:20, updated_at=2023-08-28 13:51:20)
Provenance:
ποΈ storage: Storage(id='7gYw68gC', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-08-28 13:50:39, created_by_id='DzTjkKse')
π« transform: Transform(id='Nv48yAceNSh8z8', name='Validate & register scRNA-seq datasets', short_name='scrna', version='0', type=notebook, updated_at=2023-08-28 13:51:15, created_by_id='DzTjkKse')
π£ run: Run(id='ujzl8FtsURX7meXQWLrn', run_at=2023-08-28 13:50:41, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
π€ created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-08-28 13:50:39)
Features:
var (X):
π index (36503, bionty.Gene.id): ['0tqWIZ0EwOF6', 'rT0Xjh7mbeht', '3WrrzHSaNKiX', 'S0H0s3WM12iQ', 'z7loK3Eqm6rq'...]
external:
π species (1, bionty.Species): ['human']
obs (metadata):
π cell_type (32, bionty.CellType): ['naive B cell', 'effector memory CD4-positive, alpha-beta T cell', 'regulatory T cell', 'animal cell', 'gamma-delta T cell']
π assay (4, bionty.ExperimentalFactor): ["10x 5' v2", "10x 3' v3", "10x 5' v1", 'single-cell RNA sequencing']
π tissue (17, bionty.Tissue): ['lamina propria', 'blood', 'duodenum', 'bone marrow', 'spleen']
π donor (12, core.Label): ['621B', 'A29', 'A35', '637C', 'A36']
A less well curated dataset#
Transform #
Letβs now consider a dataset with less-well curated features:
pbcm68k = ln.dev.datasets.anndata_pbmc68k_reduced()
We see that this dataset is indexed by gene symbols:
pbcm68k.var.index
Index(['HES4', 'TNFRSF4', 'SSU72', 'PARK7', 'RBP7', 'SRM', 'MAD2L2', 'AGTRAP',
'TNFRSF1B', 'EFHD2',
...
'ATP5O', 'MRPS6', 'TTC3', 'U2AF1', 'CSTB', 'SUMO3', 'ITGB2', 'S100B',
'PRMT2', 'MT-ND3'],
dtype='object', name='index', length=765)
Validate #
validated = lb.Gene.validate(pbcm68k.var.index, lb.Gene.symbol)
π‘ using global setting species = human
β
695 terms (90.80%) are validated for symbol
β 70 terms (9.20%) are not validated for symbol: ATPIF1, C1orf228, CCBL2, RP11-782C8.1, RP11-277L2.3, RP11-156E8.1, AC079767.4, GPX1, H1FX, SELT, ATP5I, IGJ, CCDC109B, FYB, H2AFY, FAM65B, HIST1H4C, HIST1H1E, ZNRD1, C6orf48, ...
In this case, we only want to register data with validated genes:
pbcm68k_validated = pbcm68k[:, validated].copy()
Validate cell types:
# inspect shows none of the terms are mappable
lb.CellType.inspect(pbcm68k_validated.obs["cell_type"])
# here we search the cell type names from the public ontology and grab the top match
# then add the cell type names from the pbcm68k as synonyms
celltype_bt = lb.CellType.bionty()
ontology_ids = []
mapper = {}
for ct in pbcm68k_validated.obs["cell_type"].unique():
ontology_id = celltype_bt.search(ct).iloc[0].ontology_id
record = lb.CellType.from_bionty(ontology_id=ontology_id)
mapper[ct] = record.name
record.save()
record.add_synonym(ct)
# standardize cell type names in the dataset
pbcm68k_validated.obs["cell_type"] = pbcm68k_validated.obs["cell_type"].map(mapper)
Show code cell output
β received 9 unique terms, 61 empty/duplicated terms are ignored
β 9 terms (100.00%) are not validated for name: Dendritic cells, CD19+ B, CD4+/CD45RO+ Memory, CD8+ Cytotoxic T, CD4+/CD25 T Reg, CD14+ Monocytes, CD56+ NK, CD8+/CD45RA+ Naive Cytotoxic, CD34+
π‘ couldn't validate 9 terms: CD34+, Dendritic cells, CD8+/CD45RA+ Naive Cytotoxic, CD14+ Monocytes, CD8+ Cytotoxic T, CD19+ B, CD4+/CD45RO+ Memory, CD56+ NK, CD4+/CD25 T Reg
π‘ β if you are sure, add records to your registry via .from_values()
β
created 1 CellType record from Bionty matching ontology_id: CL:0000451
π‘ also saving parents of CellType(id='9JGbXeUA', name='dendritic cell', ontology_id='CL:0000451', description='A Cell Of Hematopoietic Origin, Typically Resident In Particular Tissues, Specialized In The Uptake, Processing, And Transport Of Antigens To Lymph Nodes For The Purpose Of Stimulating An Immune Response Via T Cell Activation. These Cells Are Lineage Negative (Cd3-Negative, Cd19-Negative, Cd34-Negative, And Cd56-Negative).', updated_at=2023-08-28 13:51:21, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000738
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='MkrH0gsX', name='leukocyte', ontology_id='CL:0000738', synonyms='white blood cell|leucocyte', description='An Achromatic Cell Of The Myeloid Or Lymphoid Lineages Capable Of Ameboid Movement, Found In Blood Or Other Tissue.', updated_at=2023-08-28 13:51:22, bionty_source_id='glQH', created_by_id='DzTjkKse')
π‘ also saving parents of CellType(id='9JGbXeUA', name='dendritic cell', ontology_id='CL:0000451', synonyms='Dendritic cells', description='A Cell Of Hematopoietic Origin, Typically Resident In Particular Tissues, Specialized In The Uptake, Processing, And Transport Of Antigens To Lymph Nodes For The Purpose Of Stimulating An Immune Response Via T Cell Activation. These Cells Are Lineage Negative (Cd3-Negative, Cd19-Negative, Cd34-Negative, And Cd56-Negative).', updated_at=2023-08-28 13:51:22, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0001087
π‘ also saving parents of CellType(id='6VQXlWS7', name='effector memory CD4-positive, alpha-beta T cell, terminally differentiated', ontology_id='CL:0001087', synonyms='CD4-positive TEMRA|CD4+ TEMRA', description='A Cd4-Positive, Alpha Beta Memory T Cell With The Phenotype Cd45Ra-Positive, Cd45Ro-Negative, And Ccr7-Negative.', updated_at=2023-08-28 13:51:22, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 2 CellType records from Bionty matching ontology_id: CL:4030002, CL:0000897
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='ylUbqlrS', name='effector memory CD45RA-positive, alpha-beta T cell, terminally differentiated', ontology_id='CL:4030002', synonyms='terminally differentiated effector memory cells re-expressing CD45RA|terminally differentiated effector memory CD45RA+ T cells|TEMRA cell', description='An Alpha-Beta Memory T Cell With The Phenotype Cd45Ra-Positive.', updated_at=2023-08-28 13:51:23, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000791
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='WKpZjuYS', name='mature alpha-beta T cell', ontology_id='CL:0000791', synonyms='mature alpha-beta T-lymphocyte|mature alpha-beta T lymphocyte|mature alpha-beta T-cell', description='A Alpha-Beta T Cell That Has A Mature Phenotype.', updated_at=2023-08-28 13:51:24, bionty_source_id='glQH', created_by_id='DzTjkKse')
π‘ also saving parents of CellType(id='s6Ag7R5U', name='CD4-positive, alpha-beta memory T cell', ontology_id='CL:0000897', synonyms='CD4-positive, alpha-beta memory T-cell|CD4-positive, alpha-beta memory T-lymphocyte|CD4-positive, alpha-beta memory T lymphocyte', description='A Cd4-Positive, Alpha-Beta T Cell That Has Differentiated Into A Memory T Cell.', updated_at=2023-08-28 13:51:23, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000624
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='05vQoepH', name='CD4-positive, alpha-beta T cell', ontology_id='CL:0000624', synonyms='CD4-positive, alpha-beta T lymphocyte|CD4-positive, alpha-beta T-cell|CD4-positive, alpha-beta T-lymphocyte', description='A Mature Alpha-Beta T Cell That Expresses An Alpha-Beta T Cell Receptor And The Cd4 Coreceptor.', updated_at=2023-08-28 13:51:24, bionty_source_id='glQH', created_by_id='DzTjkKse')
π‘ also saving parents of CellType(id='6VQXlWS7', name='effector memory CD4-positive, alpha-beta T cell, terminally differentiated', ontology_id='CL:0001087', synonyms='CD4+ TEMRA|CD4+/CD45RO+ Memory|CD4-positive TEMRA', description='A Cd4-Positive, Alpha Beta Memory T Cell With The Phenotype Cd45Ra-Positive, Cd45Ro-Negative, And Ccr7-Negative.', updated_at=2023-08-28 13:51:24, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000910
π‘ also saving parents of CellType(id='OxsmyL44', name='cytotoxic T cell', ontology_id='CL:0000910', synonyms='cytotoxic T lymphocyte|cytotoxic T-lymphocyte|cytotoxic T-cell', description='A Mature T Cell That Differentiated And Acquired Cytotoxic Function With The Phenotype Perforin-Positive And Granzyme-B Positive.', updated_at=2023-08-28 13:51:25, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000911
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='yvHkIrVI', name='effector T cell', ontology_id='CL:0000911', synonyms='effector T-lymphocyte|effector T-cell|effector T lymphocyte', description='A Differentiated T Cell With Ability To Traffic To Peripheral Tissues And Is Capable Of Mounting A Specific Immune Response.', updated_at=2023-08-28 13:51:25, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0002419
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='2C5PhwrW', name='mature T cell', ontology_id='CL:0002419', synonyms='mature T-cell|CD3e-positive T cell', description='A T Cell That Expresses A T Cell Receptor Complex And Has Completed T Cell Selection.', updated_at=2023-08-28 13:51:26, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000084
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='BxNjby0x', name='T cell', ontology_id='CL:0000084', synonyms='T-lymphocyte|T-cell|T lymphocyte', description='A Type Of Lymphocyte Whose Defining Characteristic Is The Expression Of A T Cell Receptor Complex.', updated_at=2023-08-28 13:51:27, bionty_source_id='glQH', created_by_id='DzTjkKse')
π‘ also saving parents of CellType(id='OxsmyL44', name='cytotoxic T cell', ontology_id='CL:0000910', synonyms='CD8+ Cytotoxic T|cytotoxic T-cell|cytotoxic T-lymphocyte|cytotoxic T lymphocyte', description='A Mature T Cell That Differentiated And Acquired Cytotoxic Function With The Phenotype Perforin-Positive And Granzyme-B Positive.', updated_at=2023-08-28 13:51:27, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000919
π‘ also saving parents of CellType(id='ORD0dMdt', name='CD8-positive, CD25-positive, alpha-beta regulatory T cell', ontology_id='CL:0000919', synonyms='CD8+CD25+ Treg|CD8+CD25+ T-lymphocyte|CD8+CD25+ T(reg)|CD8+CD25+ T lymphocyte|CD8+CD25+ T cell|CD8-positive, CD25-positive Treg|CD8-positive, CD25-positive, alpha-beta regulatory T-lymphocyte|CD8-positive, CD25-positive, alpha-beta regulatory T-cell|CD8+CD25+ T-cell|CD8-positive, CD25-positive, alpha-beta regulatory T lymphocyte', description='A Cd8-Positive Alpha Beta-Positive T Cell With The Phenotype Foxp3-Positive And Having Suppressor Function.', updated_at=2023-08-28 13:51:27, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000795
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='oTsFrhYW', name='CD8-positive, alpha-beta regulatory T cell', ontology_id='CL:0000795', synonyms='CD8-positive, alpha-beta regulatory T-cell|CD8-positive, alpha-beta Treg|CD8-positive T(reg)|CD8-positive, alpha-beta regulatory T lymphocyte|CD8+ Treg|CD8+ T(reg)|CD8+ regulatory T cell|CD8-positive, alpha-beta regulatory T-lymphocyte|CD8-positive Treg', description='A Cd8-Positive, Alpha-Beta T Cell That Regulates Overall Immune Responses As Well As The Responses Of Other T Cell Subsets Through Direct Cell-Cell Contact And Cytokine Release.', updated_at=2023-08-28 13:51:28, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0000625
β now recursing through parents: this only happens once, but is much slower than bulk saving
π‘ you can switch this off via: lb.settings.auto_save_parents = False
π‘ also saving parents of CellType(id='VnKkQsME', name='CD8-positive, alpha-beta T cell', ontology_id='CL:0000625', synonyms='CD8-positive, alpha-beta T lymphocyte|CD8-positive, alpha-beta T-lymphocyte|CD8-positive, alpha-beta T-cell', description='A T Cell Expressing An Alpha-Beta T Cell Receptor And The Cd8 Coreceptor.', updated_at=2023-08-28 13:51:29, bionty_source_id='glQH', created_by_id='DzTjkKse')
π‘ also saving parents of CellType(id='ORD0dMdt', name='CD8-positive, CD25-positive, alpha-beta regulatory T cell', ontology_id='CL:0000919', synonyms='CD8+CD25+ Treg|CD8+CD25+ T-lymphocyte|CD8+CD25+ T cell|CD8-positive, CD25-positive Treg|CD8-positive, CD25-positive, alpha-beta regulatory T lymphocyte|CD4+/CD25 T Reg|CD8+CD25+ T lymphocyte|CD8+CD25+ T-cell|CD8-positive, CD25-positive, alpha-beta regulatory T-lymphocyte|CD8-positive, CD25-positive, alpha-beta regulatory T-cell|CD8+CD25+ T(reg)', description='A Cd8-Positive Alpha Beta-Positive T Cell With The Phenotype Foxp3-Positive And Having Suppressor Function.', updated_at=2023-08-28 13:51:29, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0002057
π‘ also saving parents of CellType(id='O0AQiAuv', name='CD14-positive, CD16-negative classical monocyte', ontology_id='CL:0002057', synonyms='CD16-negative monocyte|CD16- monocyte', description='A Classical Monocyte That Is Cd14-Positive, Cd16-Negative, Cd64-Positive, Cd163-Positive.', updated_at=2023-08-28 13:51:29, bionty_source_id='glQH', created_by_id='DzTjkKse')
π‘ also saving parents of CellType(id='O0AQiAuv', name='CD14-positive, CD16-negative classical monocyte', ontology_id='CL:0002057', synonyms='CD16-negative monocyte|CD14+ Monocytes|CD16- monocyte', description='A Classical Monocyte That Is Cd14-Positive, Cd16-Negative, Cd64-Positive, Cd163-Positive.', updated_at=2023-08-28 13:51:29, bionty_source_id='glQH', created_by_id='DzTjkKse')
β
created 1 CellType record from Bionty matching ontology_id: CL:0002102
π‘ also saving parents of CellType(id='Xkw89opD', name='CD38-negative naive B cell', ontology_id='CL:0002102', synonyms='CD38-negative naive B lymphocyte|CD38-negative naive B-cell|CD38- naive B-cell|CD38-negative naive B-lymphocyte|CD38- naive B lymphocyte|CD38- naive B-lymphocyte|CD38- naive B cell', description='A Cd38-Negative Naive B Cell Is A Mature B Cell That Has The Phenotype Cd38-Negative, Surface Igd-Positive, Surface Igm-Positive, And Cd27-Negative, That Has Not Yet Been Activated By Antigen In The Periphery.', updated_at=2023-08-28 13:51:30, bionty_source_id='glQH', created_by_id='DzTjkKse')
π‘ also saving parents of CellType(id='Xkw89opD', name='CD38-negative naive B cell', ontology_id='CL:0002102', synonyms='CD38-negative naive B-cell|CD38- naive B-cell|CD8+/CD45RA+ Naive Cytotoxic|CD38-negative naive B-lymphocyte|CD38-negative naive B lymphocyte|CD38- naive B-lymphocyte|CD38- naive B cell|CD38- naive B lymphocyte', description='A Cd38-Negative Naive B Cell Is A Mature B Cell That Has The Phenotype Cd38-Negative, Surface Igd-Positive, Surface Igm-Positive, And Cd27-Negative, That Has Not Yet Been Activated By Antigen In The Periphery.', updated_at=2023-08-28 13:51:30, bionty_source_id='glQH', created_by_id='DzTjkKse')
Now, all cell types are validated:
lb.CellType.validate(pbcm68k_validated.obs["cell_type"]);
β
9 terms (100.00%) are validated for name
Register #
file = ln.File.from_anndata(
pbcm68k_validated, description="10x reference pbmc68k", var_ref=lb.Gene.symbol
)
π‘ file will be copied to default storage upon `save()` with key `None` ('.lamindb/SIdlfiN2VEwYVeGfIcBS.h5ad')
π‘ parsing feature names of X stored in slot 'var'
π‘ using global setting species = human
β
695 terms (100.00%) are validated for symbol
π‘ using global setting species = human
β
linked: FeatureSet(id='6zHhikTO5adNLGJ1Y9Ue', n=695, type='float', registry='bionty.Gene', hash='W4ps_86b5dxk2Wd1gWTo', created_by_id='DzTjkKse')
π‘ parsing feature names of slot 'obs'
β
1 term (25.00%) is validated for name
β 3 terms (75.00%) are not validated for name: n_genes, percent_mito, louvain
β
linked: FeatureSet(id='UsZGAtfE2LEHGacTs9Bw', n=1, registry='core.Feature', hash='0mQgVR7JxgIWIXTcCbSy', modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
file.save()
β
saved 2 feature sets for slots: 'var','obs'
β
storing file 'SIdlfiN2VEwYVeGfIcBS' at '.lamindb/SIdlfiN2VEwYVeGfIcBS.h5ad'
var_feature_set = file.features.get_feature_set("var")
var_feature_set.modality = modalities.rna
var_feature_set.save()
cell_types = lb.CellType.from_values(pbcm68k_validated.obs["cell_type"], "name")
file.add_labels(cell_types, "cell_type")
file.add_labels(lb.settings.species, feature="species")
file.add_labels(scrna, feature="assay")
β
loaded: FeatureSet(id='U7p2tI8mncwoylRFnpdy', n=1, registry='core.Feature', hash='xXZsHT031KA0uzI9zDhB', updated_at=2023-08-28 13:51:20, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
β
linked new feature 'species' together with new feature set FeatureSet(id='U7p2tI8mncwoylRFnpdy', n=1, registry='core.Feature', hash='xXZsHT031KA0uzI9zDhB', updated_at=2023-08-28 13:51:31, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
π‘ no file links to it anymore, deleting feature set FeatureSet(id='U7p2tI8mncwoylRFnpdy', n=1, registry='core.Feature', hash='xXZsHT031KA0uzI9zDhB', updated_at=2023-08-28 13:51:31, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
β
linked new feature 'assay' together with new feature set FeatureSet(id='qBIrOTzSbFJU46RJYmM5', n=2, registry='core.Feature', hash='RUkub0mwALzWQXuSo6FL', updated_at=2023-08-28 13:51:31, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
file.features
'var': FeatureSet(id='6zHhikTO5adNLGJ1Y9Ue', n=695, type='float', registry='bionty.Gene', hash='W4ps_86b5dxk2Wd1gWTo', updated_at=2023-08-28 13:51:31, modality_id='zmqL7br5', created_by_id='DzTjkKse')
'obs': FeatureSet(id='UsZGAtfE2LEHGacTs9Bw', n=1, registry='core.Feature', hash='0mQgVR7JxgIWIXTcCbSy', updated_at=2023-08-28 13:51:31, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
'external': FeatureSet(id='qBIrOTzSbFJU46RJYmM5', n=2, registry='core.Feature', hash='RUkub0mwALzWQXuSo6FL', updated_at=2023-08-28 13:51:31, modality_id='Z6UUsZqD', created_by_id='DzTjkKse')
file.describe()
π‘ File(id='SIdlfiN2VEwYVeGfIcBS', key=None, suffix='.h5ad', accessor='AnnData', description='10x reference pbmc68k', version=None, size=589484, hash='eKVXV5okt5YRYjySMTKGEw', hash_type='md5', created_at=2023-08-28 13:51:31, updated_at=2023-08-28 13:51:31)
Provenance:
ποΈ storage: Storage(id='7gYw68gC', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna', type='local', updated_at=2023-08-28 13:50:39, created_by_id='DzTjkKse')
π« transform: Transform(id='Nv48yAceNSh8z8', name='Validate & register scRNA-seq datasets', short_name='scrna', version='0', type=notebook, updated_at=2023-08-28 13:51:31, created_by_id='DzTjkKse')
π£ run: Run(id='ujzl8FtsURX7meXQWLrn', run_at=2023-08-28 13:50:41, transform_id='Nv48yAceNSh8z8', created_by_id='DzTjkKse')
π€ created_by: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-08-28 13:50:39)
Features:
var (X):
π index (695, bionty.Gene.id): ['VPG6Ybxhk9ss', 'zOUVvOZ5PDec', '3z0yr6iybn0l', 'R0KxhGBHlynU', 'VSc0IwLJsfrD'...]
external:
π assay (1, bionty.ExperimentalFactor): ['single-cell RNA sequencing']
π species (1, bionty.Species): ['human']
obs (metadata):
π cell_type (9, bionty.CellType): ['conventional dendritic cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD14-positive, CD16-negative classical monocyte', 'dendritic cell', 'cytotoxic T cell']
file.view_lineage()
π Now letβs continue with data integration: Integrate scRNA-seq datasets