Difference between revisions of "TXGP Data Description"

From Marcotte Lab
Jump to: navigation, search
(Data pre-processing)
Line 12: Line 12:
 
* Remove low-complex reads, with less than 4 letters ('0123' for color space, 'ATGC' for base space).
 
* Remove low-complex reads, with less than 4 letters ('0123' for color space, 'ATGC' for base space).
  
== TXGP ''X. laevis'' data ==
+
== TXGP ''X. laevis'' BAC data ==
 
=== TXGP_XENLA_BAC2k_SA09023 (SOLiDv2)===
 
=== TXGP_XENLA_BAC2k_SA09023 (SOLiDv2)===
 +
* One plate of CHORI-219 BAC library.
 +
* Mate-pair library with 2kb insertion.
  
 
=== TXGP_XENLA_BAC5k_SA09023 (SOLiDv2) ===
 
=== TXGP_XENLA_BAC5k_SA09023 (SOLiDv2) ===
 +
* One plate of CHORI-219 BAC library (sample as BAC2k_SA09023).
 +
* Mate-pair library with 5kb insertion.
 +
* File size.
 +
<pre>SA09023_XENLA_96BAC2kb_F3.called.fastq.gz 1.8G
 +
SA09023_XENLA_96BAC2kb_R3.called.fastq.gz 1.8G
 +
SA09023_XENLA_96BAC5kb_F3.called.fastq.gz 1.3G
 +
SA09023_XENLA_96BAC5kb_R3.called.fastq.gz 1.4G</pre>
 +
 +
== TXGP ''X. laevis'' whole genome data ==
 +
* J-strain from Mustafa Khokha (Yale University).
 +
* Mate-pair library with 1,500 bp insertion.
  
 
=== TXGP_XENLA_WG1500_SA10026 (SOLiDv3) ===
 
=== TXGP_XENLA_WG1500_SA10026 (SOLiDv3) ===
Line 27: Line 40:
 
* SA10026_XENLA_WG1500_LoAmpManR3 (63.7M)
 
* SA10026_XENLA_WG1500_LoAmpManR3 (63.7M)
  
 +
== TXGP ''X. laevis'' RNA-seq data ==
 
=== TXGP_XENLA_RNA_SA11017 (SOLiDv3) ===
 
=== TXGP_XENLA_RNA_SA11017 (SOLiDv3) ===
 
* SA11017_XENLA_Heart_JA11050v3BC10F3 (24.0M)
 
* SA11017_XENLA_Heart_JA11050v3BC10F3 (24.0M)

Revision as of 10:08, 4 October 2011

Contents

Naming convention

  • Directory name: '(project group)_(species code)_(sample type)_(run ID)'
  • File name: '(run ID)_(species code)_(description)_(sample prep ID,barcode,F3/F5/R3)'
  • Species code
    • XENLA (Xenopus laevis)
    • XENTR (Xenopus tropicalis a.k.a. Silurana tropicalis)
    • ENGPU (Engystomops pustulosus a.k.a. Túngara Frog or Physalaemus pustulosus).

Data pre-processing

  • Remove reads with any no-call('N' in Illumina fastq file; '.' in SOLiD csfasta file).
  • Remove low-complex reads, with less than 4 letters ('0123' for color space, 'ATGC' for base space).

TXGP X. laevis BAC data

TXGP_XENLA_BAC2k_SA09023 (SOLiDv2)

  • One plate of CHORI-219 BAC library.
  • Mate-pair library with 2kb insertion.

TXGP_XENLA_BAC5k_SA09023 (SOLiDv2)

  • One plate of CHORI-219 BAC library (sample as BAC2k_SA09023).
  • Mate-pair library with 5kb insertion.
  • File size.
SA09023_XENLA_96BAC2kb_F3.called.fastq.gz	1.8G
SA09023_XENLA_96BAC2kb_R3.called.fastq.gz	1.8G
SA09023_XENLA_96BAC5kb_F3.called.fastq.gz	1.3G
SA09023_XENLA_96BAC5kb_R3.called.fastq.gz	1.4G

TXGP X. laevis whole genome data

  • J-strain from Mustafa Khokha (Yale University).
  • Mate-pair library with 1,500 bp insertion.

TXGP_XENLA_WG1500_SA10026 (SOLiDv3)

  • SA10026_XENLA_WG1500_HiAmp1ManF3 (80.1M)
  • SA10026_XENLA_WG1500_HiAmp1ManR3 (78.8M)
  • SA10026_XENLA_WG1500_HiAmp2ManF3 (77.0M)
  • SA10026_XENLA_WG1500_HiAmp2ManR3 (76.7M)
  • SA10026_XENLA_WG1500_HiAmpEZF3 (83.3M)
  • SA10026_XENLA_WG1500_HiAmpEZR3 (81.6M)
  • SA10026_XENLA_WG1500_LoAmpManF3 (65.0M)
  • SA10026_XENLA_WG1500_LoAmpManR3 (63.7M)

TXGP X. laevis RNA-seq data

TXGP_XENLA_RNA_SA11017 (SOLiDv3)

  • SA11017_XENLA_Heart_JA11050v3BC10F3 (24.0M)
  • SA11017_XENLA_Heart_JA11050v3BC10F5 (23.4M)
  • SA11017_XENLA_Testis_JA11050v3BC04F3 (33.1M)
  • SA11017_XENLA_Testis_JA11050v3BC04F5 (32.4M)

TXGP_XENLA_RNA_SA11022 (SOLiDv3)

  • SA11022_XENLA_Egg_JA11015v4BC001F3 (19.3M)
  • SA11022_XENLA_Egg_JA11015v4BC001F5 (19.4M)
  • SA11022_XENLA_Stage24_JA11015v2BC13F3 (16.5M)
  • SA11022_XENLA_Stage24_JA11015v2BC13F5 (16.6M)

TXGP_ENGPU_RNA_SA11022 (SOLiDv3)

  • SA11022_ENGPU_Larnyx_JA11015v4BC002F3 (21.1M)
  • SA11022_ENGPU_Larnyx_JA11015v4BC002F5 (21.1M)

TXGP_XENLA_RNA_SA11024 (SOLiDv3)

  • SA11024_XENLA_Liver_JA11055v2BC12F3 (21.0M)
  • SA11024_XENLA_Liver_JA11055v2BC12F5 (22.0M)
  • SA11024_XENLA_Lung_JA11055v2BC11F3 (35.1M)
  • SA11024_XENLA_Lung_JA11055v2BC11F5 (36.7M)
  • SA11024_XENLA_Stomach_JA11055v4BC003F3 (27.8M)
  • SA11024_XENLA_Stomach_JA11055v4BC003F5 (29.1M)

TXGP X. tropicalis data

TXGP_XENTR_WG5k_SA09023 (SOLiDv2)

Contributed Data

We are looking for X. laevis RNA-seq data for building comprehensive gene models.

ConlonLab2011_XENLA_RNA_UNC201106 (Illumina HiSeq)

Data from Frank Conlon lab at University of North Carolina at Chapel Hill.

  • ConlonLab2011_XENLA_Stage38Heart_WT (27.8M)
  • ConlonLab2011_XENLA_Stage45Heart_CtrlMO (33.2M)

ConlonLab2011_XENTR_RNA_UNC201106 (Illumina HiSeq)

Data from Frank Conlon lab at University of North Carolina at Chapel Hill.

  • ConlonLab2011_XENTR_Heart_WT1
  • ConlonLab2011_XENTR_Heart_WT2