Data published here are various datasets used in the publication Algal (meta)proteomes uncover cellular adaptations to life on the Greenland Ice Sheet, by Feord et al., submitted for publication. Four datasets are presented in this data publication: i) amplicon sequencing (16S and 18S), ii) cell count and biovolumes of algae morphotypes quantified with a FlowCam, iii) raw and normalized metabolomic data (quantified with LC-MS and GC-MS), and iv) file containing a predicted protein database. The protein data used in Feord et al. (submitted), is available on ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD057047 (username: reviewer_pxd057047@ebi.ac.uk and password: kwg7a3NHfhwg).
All data except dataset iv originate from samples collected on the Greenland Ice Sheet in the Summer of 2021 during the DEEP PURPLE ERC ice camp (GR21). This field location (61°05’ N,46°50’ W) is described in Feord et al. (submitted). Datasets i-iii are three different analyses of the same two samples: one snow sample collected on the 24th July 2021 and one ice sample collected on the 7th August 2021. Both samples were high in algal biomass, with the snow sample being visibly red due to pigment-rich snow algae and the ice sample visible purple/brown due to pigment-rich glacier ice algae. All collection, extraction, and analyses methods are described and referenced Feord et al. (submitted).
Analysis and replication within the samples are:
i. Amplicon sequencing (for both 18S and 16S sequencing): SNOW one biological replicate sequenced = one sequencing reaction, and ICE: sequenced with three biological replicates (labelled a,bc) = three sequencing reactions. Raw sequencing data is provided as fastq.gz files and abundance tables as .txt files.
ii. Cell counts and biovolume with FlowCam: SNOW: one biological replicates measured in technical triplicates = three measurements (labelled 1,2,3) and ICE: three biological replicates (labelled a,b,c) measured in technical triplicate (labelled 1,2,3) = nine measurements. Data is provided as .txt files and .png files.
iii. Metabolomic analyses: SNOW: five biological replicates (labelled red_RS1-5) measured in three/four technical replicates (labelled F1-F4) = 19 measurements, and ICE: three biological replicates (labelled GIA_RS1-3) measured in technical triplicates (labelled F1-F3) = nine measurements. Raw data is provided as .mzML files and processed data and tables with sample explanation files are provided as .txt files.
Data iv) is a FASTA file (.fa) with the predicted protein database used to identity proteins from peptide data in Feord et al. (submitted). The database was built by translating open reading frames (ORFs) assembled from previously sequenced polyA-isolated metatranscriptomes from Greenland Ice Sheet samples published by Perini et al. (2024), using the samples MG3, MG5, MG6, MG7, MG8, MG11, MG12, MG14, MG19, MG22, MG23, MG24, MG25, MG26. MG27, MG28, MG30, MG31 from that paper. Assembly, identification of ORFs, and dereplication is described by Feord et al. (submitted)
This data publication presents quantitative DNA data obtained through fluorometric detection of genomic DNA and the estimation of 16S rRNA gene copies using quantitative Polymerase Chain Reaction (qPCR). The data encompasses various soil and rock samples collected across a climate gradient. The DNA was extracted using a protocol enabling the separate analysis of intracellular DNA (iDNA) and extracellular DNA (eDNA) from the same sample. The primary objective of this study was to enhance a previously established method developed by Alawi et al. (2014) for analyzing terrestrial samples by introducing modifications to the extraction buffer.
Phosphate buffers at two different concentrations (120 mM and 300 mM), EDTA (300 mM), and a high-concentration phosphate buffer in combination with EDTA (300 mM each) were tested in conjunction with a detergent mix (detailed in Medina et al., 2023; submitted). Thorough tests, including spiked DNA experiments and cell counts, were conducted on one low biomass sample to validate the extraction setups. The two most effective extraction protocols were then applied to all samples from the four designated sites and compared with the phosphate buffer described by Alawi et al. (2014), resulting in the calculation of improvement factors.
The resulting dataset provides valuable quantitative DNA information and estimates of 16S rRNA gene copies across diverse soil and rock samples along a climate gradient. The modifications made to the extraction buffer demonstrated improved efficiency in extracting especially iDNA compared to the original method. These findings contribute to the refinement and optimization of DNA extraction protocols for terrestrial samples, enabling more accurate and comprehensive analyses of microbial communities in different environments.