Data published here are various datasets used in the publication Algal (meta)proteomes uncover cellular adaptations to life on the Greenland Ice Sheet, by Feord et al., submitted for publication. Four datasets are presented in this data publication: i) amplicon sequencing (16S and 18S), ii) cell count and biovolumes of algae morphotypes quantified with a FlowCam, iii) raw and normalized metabolomic data (quantified with LC-MS and GC-MS), and iv) file containing a predicted protein database. The protein data used in Feord et al. (submitted), is available on ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD057047 (username: reviewer_pxd057047@ebi.ac.uk and password: kwg7a3NHfhwg).
All data except dataset iv originate from samples collected on the Greenland Ice Sheet in the Summer of 2021 during the DEEP PURPLE ERC ice camp (GR21). This field location (61°05’ N,46°50’ W) is described in Feord et al. (submitted). Datasets i-iii are three different analyses of the same two samples: one snow sample collected on the 24th July 2021 and one ice sample collected on the 7th August 2021. Both samples were high in algal biomass, with the snow sample being visibly red due to pigment-rich snow algae and the ice sample visible purple/brown due to pigment-rich glacier ice algae. All collection, extraction, and analyses methods are described and referenced Feord et al. (submitted).
Analysis and replication within the samples are:
i. Amplicon sequencing (for both 18S and 16S sequencing): SNOW one biological replicate sequenced = one sequencing reaction, and ICE: sequenced with three biological replicates (labelled a,bc) = three sequencing reactions. Raw sequencing data is provided as fastq.gz files and abundance tables as .txt files.
ii. Cell counts and biovolume with FlowCam: SNOW: one biological replicates measured in technical triplicates = three measurements (labelled 1,2,3) and ICE: three biological replicates (labelled a,b,c) measured in technical triplicate (labelled 1,2,3) = nine measurements. Data is provided as .txt files and .png files.
iii. Metabolomic analyses: SNOW: five biological replicates (labelled red_RS1-5) measured in three/four technical replicates (labelled F1-F4) = 19 measurements, and ICE: three biological replicates (labelled GIA_RS1-3) measured in technical triplicates (labelled F1-F3) = nine measurements. Raw data is provided as .mzML files and processed data and tables with sample explanation files are provided as .txt files.
Data iv) is a FASTA file (.fa) with the predicted protein database used to identity proteins from peptide data in Feord et al. (submitted). The database was built by translating open reading frames (ORFs) assembled from previously sequenced polyA-isolated metatranscriptomes from Greenland Ice Sheet samples published by Perini et al. (2024), using the samples MG3, MG5, MG6, MG7, MG8, MG11, MG12, MG14, MG19, MG22, MG23, MG24, MG25, MG26. MG27, MG28, MG30, MG31 from that paper. Assembly, identification of ORFs, and dereplication is described by Feord et al. (submitted)
These datasets display the results of multivariate statistical analysis non-metric multidimensional scaling (NMDS) based on Bray Curtis dissimilarity of the organic matter (OM) molecular compositions of surface glacier purple ice- and red snow-algae dominated samples collected on the Greenland Ice Sheet at ca. 61°1’ N,46°8’ W (Rossel et al., 2025). The molecular compositions of the samples were obtained by ultrahigh resolution analysis on a 15 Tesla Fourier transform ion cyclotron resonance mass spectrometer (FTICR-MS, Rossel et al., 2025). All reported NMDS datasets display the molecular loadings and samples scores for the first two axes of the NMDS (NMDS1 and NMDS2), the number occurrences of each molecular formula per sample type, and molecular properties of the formulae such as: mass (MWwa), hydrogen/carbon (H/Cwa) and oxygen/carbon (O/Cwa) ratios, aromaticity index (AI-modwa), double bond equivalents (DBEwa) and DBE minus oxygen (DBE-Owa), Nominal oxidation state of carbon (NOSCwa) and the molecular category the formula was assigned (Aromatics, Condensed aromatics, highly unsaturated, unsaturated aliphatics and saturated). Furthermore, the NMDS datasets are separated according to the compared sample set. In the first NMDS analysis (Table S1, Fig 1 in Rossel et al., in review), we compared all samples: the initial OM from glacier ice- (T0_Ice) and snow-algae (T0_Snow) dominated habitats and the up to 24 days (T3-T24) in situ incubated samples under dark (D) and light (L) conditions. These OM samples, include both dissolved organic matter (DOM) and particulate organic matter (POM), the latter extracted with hot water (HW) and sodium hydroxide (Na) to represent water-soluble and particle-associated OM, respectively (see methods). In the second and third NMDS analyses, we compared DOM and POM samples separated (Table S2 and Table S3, respectively). Following the separation of all analyzed samples in the first NMDS (purple and red samples in Fig 1 in Rossel et al., in review), OM molecular signals related to glacier ice-algae (Table S4) and snow-algae (Table S5) were separated using NMDS1 values ≤ 0.45 and ≥ 0.45, respectively (Fig. 1b and Fig. 1c in Rossel et al., in review). Additionally, these separated molecular signals for glacier ice-algae and snow-algae samples were used to calculate intensity weighted (subscript wa) values for MWwa, H/Cwa and O/Cwa ratios, AI-modwa, NOSCwa, DBEwa and DBE-Owa for each sample (Table S6).
This dataset provides molecular formulae with their normalized mass peak intensities obtained from ultrahigh resolution mass spectrometric analysis of organic matter (OM) from glacier purple ice- and red snow-algae dominated samples collected upwind of the DEEP PURPLE ice camp (deeppurple-ercsyg.eu) on the surface of the Greenland Ice Sheet. The samples are represented by the initial OM from glacier ice- (T0_Ice) and snow-algae (T0_Snow) dominated habitats and the up to 24 days (T3-T24) in situ incubated samples under dark (D) and light (L) conditions. OM samples, include dissolved organic matter (DOM) and particulate organic matter (POM), the latter extracted with hot water (HW) and sodium hydroxide (Na) to represent water-soluble and particle-associated OM, respectively (see methods). Molecular analyses were performed on a Solarix Fourier transform ion cyclotron resonance mass spectrometer (FTICR-MS) equipped with a 15 Tesla superconducting magnet (Bruker Daltonic) using an electrospray ionization source (ESI, Bruker Apollo II) in negative ion mode on DOM samples and POM extracts previously solid phase extracted (SPE, Dittmar et al., 2008). Molecular formula calculation for all samples was performed using the software ICBM-OCEAN (Merder et al., 2020) and include the following combination of elements: C0-100, O0-50, H0-200, N0-4, S0-2 and P0-1 (the full description of the data and methods is provided in the data description file). Because DOM and POM samples were analyzed in duplicates in the mass spectrometer, a compound was considered to be present if it appeared in both duplicate measurements. The mean normalized intensity of duplicate measurements is presented here and was further used for statistical analysis in Rossel et al., to be submitted. This dataset contains 8827 molecular formulae with their normalized peaks intensities.
This dataset provides the dissolved organic carbon (DOC) concentrations of the organic matter (OM) obtained from glacier purple ice- and red snow-algae dominated samples collected upwind of the DEEP PURPLE ice camp (deeppurple-ercsyg.eu) on the surface of the Greenland Ice Sheet. The samples are represented by the initial OM from glacier ice- (T0_Ice) and snow-algae (T0_Snow) dominated habitats and the up to 24 days (T3-T24) in situ incubated samples under dark (D) and light (L) conditions. OM samples, include dissolved organic matter (DOM) and particulate organic matter (POM), the latter extracted with hot water (HW) and sodium hydroxide (Na) to represent water-soluble and particle-associated OM, respectively (see methods). Dissolved organic carbon concentrations were determined as non-purgeable organic carbon obtained from replicate measurements of DOM and POM extracts analyzed in a Shimadzu high-sensitivity TOC-V analyzer. The concentrations in this dataset are part of the supplementary material in Rossel et al. (2025).