Difference between revisions of "TXGP Data Description"

From Marcotte Lab
Jump to: navigation, search
(Amin2011_XENLA_RNA_UNC (Illumina HiSeq))
(Contributed X. laevis Data)
 
(28 intermediate revisions by one user not shown)
Line 1: Line 1:
 
== Naming convention ==
 
== Naming convention ==
* Directory name: '(investigator)_(species code)_(sample type)_(SA#)'
+
* Directory name: '(project group)_(species code)_(sample type)_(run ID)'
* File name: '(SA#)_(species code)_(description)_(JA#,barcode,F3/F5/R3)'
+
* File name: '(run ID)_(species code)_(description)_(sample prep ID,barcode,F3/F5/R3)'
  
 
* Species code
 
* Species code
Line 10: Line 10:
 
== Data pre-processing ==
 
== Data pre-processing ==
 
* Remove reads with any no-call('N' in Illumina fastq file; '.' in SOLiD csfasta file).
 
* Remove reads with any no-call('N' in Illumina fastq file; '.' in SOLiD csfasta file).
* Remove low-complex reads, with less than 4 letters('0123' for color space, 'ATGC' for base space).  
+
* Remove low-complex reads, with less than 4 letters ('0123' for color space, 'ATGC' for base space).
  
== TXGP data ==
+
== TXGP ''X. laevis'' BAC data ==
=== TXGP_XENLA_BAC2k_SA09023 (SOLiDv2)===
+
* SAMPLE: One plate of CHORI-219 BAC library.
  
=== TXGP_XENLA_BAC5k_SA09023 (SOLiDv2) ===
+
=== TXGP_XENLA_BAC2k_SA09023: Mate-pair(F3=50bp; R3=50bp; insert_size=2kbp), SOLiD v2 ===
 +
* SA09023_XENLA_96BAC2kb_F3.called.fastq.gz: read_count=35M, file_size=1.8GB
 +
* SA09023_XENLA_96BAC2kb_R3.called.fastq.gz: read_count=35M, file_size=1.9GB
  
=== TXGP_XENTR_WG5k_SA09023 (SOLiDv2) ===
+
=== TXGP_XENLA_BAC5k_SA09023: Mate-pair(F3=50bp; R3=50bp; insert_size=5kbp), SOLiD v2 ===
 +
* SA09023_XENLA_96BAC5kb_F3.called.fastq.gz: read_count=28M, file_size=1.3GB
 +
* SA09023_XENLA_96BAC5kb_R3.called.fastq.gz: read_count=28M, file_size=1.4GB
  
=== TXGP_XENLA_WG1500_SA10026 (SOLiDv3) ===
+
== TXGP ''X. laevis'' whole genome data ==
 +
* SAMPLE: J-strain from Mustafa Khokha (Yale University).
  
=== TXGP_XENLA_RNA_SA11017 (SOLiDv3) ===
+
=== TXGP_XENLA_WG1500_SA10026: Mate-pair(F3=50bp; R3=50bp; insert_size=1500bp), SOLiD v3 ===
* SA11017_XENLA_Heart_JA11050v3BC10F3 (24.0M)
+
* SA10026_XENLA_WG1500_HiAmp1ManF3: read_count=80M, file_size=4.1GB
* SA11017_XENLA_Heart_JA11050v3BC10F5 (23.4M)
+
* SA10026_XENLA_WG1500_HiAmp1ManR3: read_count=79M, file_size=4.0GB
* SA11017_XENLA_Testis_JA11050v3BC04F3 (33.1M)
+
* SA10026_XENLA_WG1500_HiAmp2ManF3: read_count=77M, file_size=3.9GB
* SA11017_XENLA_Testis_JA11050v3BC04F5 (32.4M)
+
* SA10026_XENLA_WG1500_HiAmp2ManR3: read_count=77M, file_size=3.9GB
 +
* SA10026_XENLA_WG1500_HiAmpEZF3: read_count=83M, file_size=4.3GB
 +
* SA10026_XENLA_WG1500_HiAmpEZR3: read_count=82M, file_size=4.1GB
 +
* SA10026_XENLA_WG1500_LoAmpManF3: read_count=65M, file_size=3.4GB
 +
* SA10026_XENLA_WG1500_LoAmpManR3: read_count=64M, file_size=3.2GB
  
=== TXGP_XENLA_RNA_SA11022 (SOLiDv3) ===
+
== TXGP ''X. laevis'' RNA-seq data ==
* SA11022_XENLA_Egg_JA11015v4BC001F3 (19.3M)
+
=== TXGP_XENLA_RNA_SA11017: Paired-end(50bp/35bp), SOLiD v3 ===
* SA11022_XENLA_Egg_JA11015v4BC001F5 (19.4M)
+
* SA11017_XENLA_Heart_JA11050v3BC10F3: read_count=24M, file_size=1.5GB
* SA11022_XENLA_Stage24_JA11015v2BC13F3 (16.5M)
+
* SA11017_XENLA_Heart_JA11050v3BC10F5: read_count=23M, file_size=889M
* SA11022_XENLA_Stage24_JA11015v2BC13F5 (16.6M)
+
* SA11017_XENLA_Testis_JA11050v3BC04F3: read_count=33M, file_size=1.7GB
 +
* SA11017_XENLA_Testis_JA11050v3BC04F5: read_count=32M, file_size=1.3GB
  
=== TXGP_ENGPU_RNA_SA11022 (SOLiDv3) ===
+
=== TXGP_XENLA_RNA_SA11022: Paired-end(50bp/35bp), SOLiD v3 ===
* SA11022_ENGPU_RNA_JA11015v4BC002F3
+
* SA11022_XENLA_Egg_JA11015v4BC001F3: read_count=19.3M,
* SA11022_ENGPU_RNA_JA11015v4BC002F5
+
* SA11022_XENLA_Egg_JA11015v4BC001F5: read_count=19.4M,
 +
* SA11022_XENLA_Stage24_JA11015v2BC13F3: read_count=16.5M,
 +
* SA11022_XENLA_Stage24_JA11015v2BC13F5: read_count=16.6M,
  
=== TXGP_XENLA_RNA_SA11024 (SOLiDv3) ===
+
=== TXGP_XENLA_RNA_SA11024: Paired-end(50bp/35bp), SOLiD v3===
 
* SA11024_XENLA_Liver_JA11055v2BC12F3 (21.0M)
 
* SA11024_XENLA_Liver_JA11055v2BC12F3 (21.0M)
 
* SA11024_XENLA_Liver_JA11055v2BC12F5 (22.0M)
 
* SA11024_XENLA_Liver_JA11055v2BC12F5 (22.0M)
Line 45: Line 57:
 
* SA11024_XENLA_Stomach_JA11055v4BC003F5 (29.1M)
 
* SA11024_XENLA_Stomach_JA11055v4BC003F5 (29.1M)
  
== Contributed Data ==
+
=== TXGP_XENLA_RNA_OMRF20110730: Paired-end(100bp), Illumina ===
We are looking for ''X. laevis'' RNA-seq data for building comprehensive gene models.  
+
* OMRF20110730_XENLA_EGG1_1.fastq.gz: read_count=94M, file_size= 8.5GB
 +
* OMRF20110730_XENLA_EGG1_2.fastq.gz: read_count=94M, file_size= 8.5GB
 +
* OMRF20110730_XENLA_EGG2_1.fastq.gz: read_count=128M, file_size= 14GB
 +
* OMRF20110730_XENLA_EGG2_2.fastq.gz: read_count=128M, file_size= 14GB
  
=== Conlon2011_XENLA_RNA_UNC201106 (Illumina HiSeq)===
+
== Contributed ''X. laevis'' Data ==
 +
<b>We are looking for ''X. laevis'' RNA-seq data to build a comprehensive set of gene models.</b>
 +
 
 +
=== ConlonUNC_XENLA_RNA_Amin201106: Single-end (76bp), Illumina ===
 
Data from [http://www.unc.edu/~fconlon/ Frank Conlon lab] at University of North Carolina at Chapel Hill.
 
Data from [http://www.unc.edu/~fconlon/ Frank Conlon lab] at University of North Carolina at Chapel Hill.
* Conlon2011_XENLA_Stage38Heart_WT (27.8M)
+
* Amin201106_XENLA_Stage38Heart_MO.fastq.gz: read_count=31M, file_size=2.3GB
* Conlon2011_XENLA_Stage45Heart_CtrlMO (33.2M)
+
* Amin201106_XENLA_Stage38Heart_WT.fastq.gz: read_count=28M, file_size=2.0GB
 +
* Amin201106_XENLA_Stage45Heart_CtrlMO.fastq.gz: read_count=33M, file_size=2.2GB
 +
 
 +
=== HarlandUBC_XENLA_RNA_Park201106: Single-end (50bp), Illumina ===
 +
Data from [http://mcb.berkeley.edu/labs/harland/ Richard Harland lab] at University of California, Berkeley.
 +
* Park2011_XENLA_Arch1_WT.fastq.gz: read_count=101M, file_size=4.8GB
 +
* Park2011_XENLA_Arch2_WT.fastq.gz: read_count=102M, file_size=4.8GB
 +
* Park2011_XENLA_Arch3_WT.fastq.gz: read_count=96M,  file_size=4.4GB
 +
* Park2011_XENLA_ArchD_WT.fastq.gz: read_count=115M, file_size=5.5GB
 +
* Park2011_XENLA_ArchV_WT.fastq.gz: read_count=103M, file_size=4.8G
 +
 
 +
=== LauBrandeis_XENLA_RNA_Lau201109: Single-end (38bp), Illumina  ===
 +
Data from [http://www.bio.brandeis.edu/laulab/index.html Nelson Lau lab] at Brandeis University.
 +
* Lau201109_XENLA_TadpoleBrain6.fastq.gz: read_count=25M, file_size=953MB
 +
* Lau201109_XENLA_TadpoleBrain7.fastq.gz: read_count=22M, file_size=879MB
 +
* Lau201109_XENLA_TadpoleBrain8.fastq.gz: read_count=27M, file_size=984MB
 +
 
 +
== Contributed ''X. tropicalis'' Data ==
 +
=== ConlonUNC_XENTR_RNA_Amin201106 (Illumina HiSeq)===
 +
Data from [http://www.unc.edu/~fconlon/ Frank Conlon lab] at University of North Carolina at Chapel Hill.
 +
* ConlonLab2011_XENTR_Heart_WT1
 +
* ConlonLab2011_XENTR_Heart_WT2
 +
 
 +
== TXGP other Xenopus data ==
 +
 
 +
=== TXGP_XENTR_WG5k_SA09023 (SOLiDv2) ===
 +
 
 +
=== TXGP_ENGPU_RNA_SA11022: Paired-end(50bp/35bp), SOLiD v3 ===
 +
* SA11022_ENGPU_Larnyx_JA11015v4BC002F3 (21.1M)
 +
* SA11022_ENGPU_Larnyx_JA11015v4BC002F5 (21.1M)
  
 
----
 
----
 
[[Category:XenopusGenome]]
 
[[Category:XenopusGenome]]

Latest revision as of 18:48, 9 January 2013

Contents

Naming convention

  • Directory name: '(project group)_(species code)_(sample type)_(run ID)'
  • File name: '(run ID)_(species code)_(description)_(sample prep ID,barcode,F3/F5/R3)'
  • Species code
    • XENLA (Xenopus laevis)
    • XENTR (Xenopus tropicalis a.k.a. Silurana tropicalis)
    • ENGPU (Engystomops pustulosus a.k.a. Túngara Frog or Physalaemus pustulosus).

Data pre-processing

  • Remove reads with any no-call('N' in Illumina fastq file; '.' in SOLiD csfasta file).
  • Remove low-complex reads, with less than 4 letters ('0123' for color space, 'ATGC' for base space).

TXGP X. laevis BAC data

  • SAMPLE: One plate of CHORI-219 BAC library.

TXGP_XENLA_BAC2k_SA09023: Mate-pair(F3=50bp; R3=50bp; insert_size=2kbp), SOLiD v2

  • SA09023_XENLA_96BAC2kb_F3.called.fastq.gz: read_count=35M, file_size=1.8GB
  • SA09023_XENLA_96BAC2kb_R3.called.fastq.gz: read_count=35M, file_size=1.9GB

TXGP_XENLA_BAC5k_SA09023: Mate-pair(F3=50bp; R3=50bp; insert_size=5kbp), SOLiD v2

  • SA09023_XENLA_96BAC5kb_F3.called.fastq.gz: read_count=28M, file_size=1.3GB
  • SA09023_XENLA_96BAC5kb_R3.called.fastq.gz: read_count=28M, file_size=1.4GB

TXGP X. laevis whole genome data

  • SAMPLE: J-strain from Mustafa Khokha (Yale University).

TXGP_XENLA_WG1500_SA10026: Mate-pair(F3=50bp; R3=50bp; insert_size=1500bp), SOLiD v3

  • SA10026_XENLA_WG1500_HiAmp1ManF3: read_count=80M, file_size=4.1GB
  • SA10026_XENLA_WG1500_HiAmp1ManR3: read_count=79M, file_size=4.0GB
  • SA10026_XENLA_WG1500_HiAmp2ManF3: read_count=77M, file_size=3.9GB
  • SA10026_XENLA_WG1500_HiAmp2ManR3: read_count=77M, file_size=3.9GB
  • SA10026_XENLA_WG1500_HiAmpEZF3: read_count=83M, file_size=4.3GB
  • SA10026_XENLA_WG1500_HiAmpEZR3: read_count=82M, file_size=4.1GB
  • SA10026_XENLA_WG1500_LoAmpManF3: read_count=65M, file_size=3.4GB
  • SA10026_XENLA_WG1500_LoAmpManR3: read_count=64M, file_size=3.2GB

TXGP X. laevis RNA-seq data

TXGP_XENLA_RNA_SA11017: Paired-end(50bp/35bp), SOLiD v3

  • SA11017_XENLA_Heart_JA11050v3BC10F3: read_count=24M, file_size=1.5GB
  • SA11017_XENLA_Heart_JA11050v3BC10F5: read_count=23M, file_size=889M
  • SA11017_XENLA_Testis_JA11050v3BC04F3: read_count=33M, file_size=1.7GB
  • SA11017_XENLA_Testis_JA11050v3BC04F5: read_count=32M, file_size=1.3GB

TXGP_XENLA_RNA_SA11022: Paired-end(50bp/35bp), SOLiD v3

  • SA11022_XENLA_Egg_JA11015v4BC001F3: read_count=19.3M,
  • SA11022_XENLA_Egg_JA11015v4BC001F5: read_count=19.4M,
  • SA11022_XENLA_Stage24_JA11015v2BC13F3: read_count=16.5M,
  • SA11022_XENLA_Stage24_JA11015v2BC13F5: read_count=16.6M,

TXGP_XENLA_RNA_SA11024: Paired-end(50bp/35bp), SOLiD v3

  • SA11024_XENLA_Liver_JA11055v2BC12F3 (21.0M)
  • SA11024_XENLA_Liver_JA11055v2BC12F5 (22.0M)
  • SA11024_XENLA_Lung_JA11055v2BC11F3 (35.1M)
  • SA11024_XENLA_Lung_JA11055v2BC11F5 (36.7M)
  • SA11024_XENLA_Stomach_JA11055v4BC003F3 (27.8M)
  • SA11024_XENLA_Stomach_JA11055v4BC003F5 (29.1M)

TXGP_XENLA_RNA_OMRF20110730: Paired-end(100bp), Illumina

  • OMRF20110730_XENLA_EGG1_1.fastq.gz: read_count=94M, file_size= 8.5GB
  • OMRF20110730_XENLA_EGG1_2.fastq.gz: read_count=94M, file_size= 8.5GB
  • OMRF20110730_XENLA_EGG2_1.fastq.gz: read_count=128M, file_size= 14GB
  • OMRF20110730_XENLA_EGG2_2.fastq.gz: read_count=128M, file_size= 14GB

Contributed X. laevis Data

We are looking for X. laevis RNA-seq data to build a comprehensive set of gene models.

ConlonUNC_XENLA_RNA_Amin201106: Single-end (76bp), Illumina

Data from Frank Conlon lab at University of North Carolina at Chapel Hill.

  • Amin201106_XENLA_Stage38Heart_MO.fastq.gz: read_count=31M, file_size=2.3GB
  • Amin201106_XENLA_Stage38Heart_WT.fastq.gz: read_count=28M, file_size=2.0GB
  • Amin201106_XENLA_Stage45Heart_CtrlMO.fastq.gz: read_count=33M, file_size=2.2GB

HarlandUBC_XENLA_RNA_Park201106: Single-end (50bp), Illumina

Data from Richard Harland lab at University of California, Berkeley.

  • Park2011_XENLA_Arch1_WT.fastq.gz: read_count=101M, file_size=4.8GB
  • Park2011_XENLA_Arch2_WT.fastq.gz: read_count=102M, file_size=4.8GB
  • Park2011_XENLA_Arch3_WT.fastq.gz: read_count=96M, file_size=4.4GB
  • Park2011_XENLA_ArchD_WT.fastq.gz: read_count=115M, file_size=5.5GB
  • Park2011_XENLA_ArchV_WT.fastq.gz: read_count=103M, file_size=4.8G

LauBrandeis_XENLA_RNA_Lau201109: Single-end (38bp), Illumina

Data from Nelson Lau lab at Brandeis University.

  • Lau201109_XENLA_TadpoleBrain6.fastq.gz: read_count=25M, file_size=953MB
  • Lau201109_XENLA_TadpoleBrain7.fastq.gz: read_count=22M, file_size=879MB
  • Lau201109_XENLA_TadpoleBrain8.fastq.gz: read_count=27M, file_size=984MB

Contributed X. tropicalis Data

ConlonUNC_XENTR_RNA_Amin201106 (Illumina HiSeq)

Data from Frank Conlon lab at University of North Carolina at Chapel Hill.

  • ConlonLab2011_XENTR_Heart_WT1
  • ConlonLab2011_XENTR_Heart_WT2

TXGP other Xenopus data

TXGP_XENTR_WG5k_SA09023 (SOLiDv2)

TXGP_ENGPU_RNA_SA11022: Paired-end(50bp/35bp), SOLiD v3

  • SA11022_ENGPU_Larnyx_JA11015v4BC002F3 (21.1M)
  • SA11022_ENGPU_Larnyx_JA11015v4BC002F5 (21.1M)