Order For Similar Custom Papers & Assignment Help Services

Fill the order form details - writing instructions guides, and get your paper done.

Posted:

Assignment: I need help writing a research paper., Data Mining Project 1: Data Preprocessing

Data Mining Project 1: Data Preprocessing Released on Sept 17 Due on Oct 3 at 11:55pmSpeci?cationIn this project, the students are to write programs to apply data pre-processing and feature selection techniques to gene expression datasets for cancers/diseases. A small example of gene expression datasets is the colon cancer dataset, which contains 62 samples collected from colon-cancer patients; among those samples, 40 are tumor biopsies labeled as “positive” and 22 are normal tissue biopsies labeled as ”negative”. For simplicity, in this project we assume that each such dataset has just two classes. Each tuple (row) in the data consists of the readings for the genes, and the class (which is the last column). Each gene is an attribute. The columns are separated by ”,”, which is a commonly used format in data mining. The dataset can be found on pilot under ”Projects” called p1colon.txt. Other datasets may be provided in the same folder later. Note: Your programs will need to be able to handle other datasets. That means that your program will need to go through the data once to determine the number of samples/rows and the number of attributes. The instructor plans to test your program using other datasets. In your discussions and reports, refer to the genes as g1, …, gN, in the left-to-right order. Your project should address the following tasks:Task 1. Discretize the genes using equi-density binning with NumIntervals=k intervals for each of the genes, and select the top-K genes using info gain (see Task 3).Task 2. Similarly, discretize the genes using equi-width binning with NumIntervals=k intervals for each of the genes, and select the top-K genes using info gain (see Task 3).Task 3. Compute the information gain produced by each of the two binnings produced in the above two tasks.If K is larger than the number of available attributes, then all attributes in the dataset are selected. Your program should read three command line arguments: nameData?le, k, K. The data ?le should be located in the same folder of the executable. The executable should be called DWBinning. It should produce the following output ?les: edensitybins.txt, edensitydata.txt, ewidthbins.txt, and ewidthdata.txt. In the edensitybins.txt ?le, you should have the following information for each of the K selected genes: gene number; infogain=ig4thisgene; (bin 1 lb, bin 1 ub] (bin1C1count, bin1C2count); …; (bin k lb, bin k ub] (binkC1count, binkC2count). Use one line for one gene’s info. The genes should be listed in decreasing info gain order. Similarly for the ewidthbins.txt ?le. In the edensitydata.txt ?le, you should have the result of the discretized data for the ?rst K genes: Use 0, 1, 2, …, K as the value representing the bins, with 0 for the leftmost bin, and map the original data into discretized data. You should keep the class for each tuple but ignore genes after gene K. The genes should be listed in the same order as given in the xxxbin.txt ?les. Similarly for the ewidthbins.txt ?le. The ?rst line below is a made-up example line for the edensitybins.txt ?le, and the next line is a made-up example line for the edensitydata.txt ?le.g1670; Info Gain=0.435072; Bins: (-, 35.959] (2,3); ( 35.959,+] (18, 29)1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, positiveHere k=2, and K=11.

Data mining, Assignment: I need help writing a research paper. & Writing Services Online by Expert Writers

Order | Check Discount

Paper Writing Help For You!

Special Offer! Get 20-25% Off On your Order!

Why choose us

You Want Quality and That’s What We Deliver

Professional Writers

We assemble our team by selectively choosing highly skilled writers, each boasting specialized knowledge in specific subject areas and a robust background in academic writing

Discounted Prices

Our service is committed to delivering the finest writers at the most competitive rates, ensuring that affordability is balanced with uncompromising quality. Our pricing strategy is designed to be both fair and reasonable, standing out favorably against other writing services in the market.

AI & Plagiarism-Free

Rest assured, you'll never receive a product tainted by plagiarism or AI-generated content. Each paper is research-written by human writers, followed by a rigorous scanning process of the final draft before it's delivered to you, ensuring the content is entirely original and maintaining our unwavering commitment to providing plagiarism-free work.

How it works

When you decide to place an order with Nurscola, here is what happens:

Complete the Order Form

You will complete our order form, filling in all of the fields and giving us as much detail as possible.

Assignment of Writer

We analyze your order and match it with a writer who has the unique qualifications to complete it, and he begins from scratch.

Order in Production and Delivered

You and your writer communicate directly during the process, and, once you receive the final draft, you either approve it or ask for revisions.

Giving us Feedback (and other options)

We want to know how your experience went. You can read other clients’ testimonials too. And among many options, you can choose a favorite writer.