##### Document Actions

Size 6.7 kB - File type text/x-tex

## File contents

%% This document created by Scientific Word (R) Version 3.5

\documentclass{amsart}%
\usepackage{amsmath}
\usepackage{graphicx}%
\usepackage{amsfonts}%
\usepackage{amssymb}
%TCIDATA{OutputFilter=latex2.dll}
%TCIDATA{CSTFile=amsart.cst}
%TCIDATA{LastRevised=Friday, January 04, 2002 18:24:26}
%TCIDATA{<META NAME="GraphicsSave" CONTENT="32">}
%TCIDATA{Language=American English}
\theoremstyle{plain}
\newtheorem{acknowledgement}{Acknowledgement}
\newtheorem{algorithm}{Algorithm}
\newtheorem{case}{Case}
\newtheorem{claim}{Claim}
\newtheorem{conclusion}{Conclusion}
\newtheorem{condition}{Condition}
\newtheorem{criterion}{Criterion}
\newtheorem{notation}{Notation}
\newtheorem{problem}{Problem}
\newtheorem{solution}{Solution}
\newtheorem{summary}{Summary}
\numberwithin{equation}{section}

\begin{document}
\title{Pattern Recognition of Protein 2D Gel Image and its Application for Diagnosis
of a Disease}
\author{Gene Kim and MyungHo Kim}
\address{93B Taylor Ave. East Brunswick, NJ 08816}
\email{myungho\_kim@yahoo.com}
\thanks{The order of authors is alphabetical.}
\subjclass{Primary I.5, J.3 ; Secondary I..4.1, I..4.3}
\keywords{2D gel image, eletrophoresis, pixel, diagnosis method, standard silver
staining, protein}
\maketitle

\begin{abstract}
In this paper, we will present a diagnosis method of a disease and why this
method should work in the conceptual point of view. This method will exploit
2D gel image of proteins in serums and we will describe how to numericalize
the image.

\end{abstract}

\section{Introduction}

It is known that if a human body is invaded by a virus, the immune systems of
the body will react to the virus to suppress or deactivate it. As a result,
the system produces antibodies consisting of amino acids, in other words,
proteins, and the virus might produce proteins too. Therefore, it is a
reasonable approach to distinguish patients and normal by identifying and
estimating all the possible changes of proteins in serums. This fits perfectly
well with the second principle of the paper, \cite{KK2}\smallskip(See
\cite{KK}also).

\emph{To classify objects we are interested in, the most powerful technique
people developed is to numericalize them, in other words, finding a way of
representation into numbers and the collection of numbers into vectors IN
MEASURABLE EUCLIDEAN HYPERSPACE.\smallskip}

Our first assumption is the following.\smallskip\

$\mathbf{Assumption}$ 1

\emph{If there is a disease, then there should be changes of proteins in human body.}

\smallskip

Under the assumption above, it remains to find out how to separate and
identify proteins or a way of representation of serum(or person of the serum)
in terms of proteins. One way for it is the 2D gel method with a proper
staining method, for example, standard silver staining.

\section{On 2D gel image and staining methods}

2D gel electrophoresis is the method of separating proteins in a 2 dimensional
plane by mass and pH of proteins.

To visualize the distribution of stretched proteins, the standard silver stain
method is useful. It is very good in resolution for identifying many proteins
of small quantities.

As is often the case, there are problems we have to remedy or compromise so
that numericalizing is acceptable.

1. There is an effect called negative silver stain. When the quantity of
proteins reaches up to a certain level, the silver stain density decrease.

2. Even if experiments are done carefully, there are always some variations of
images. For example, in the gel, the same protein is not positioned with
respect to other proteins. In other words, the normalization process of images
is needed.

The first problem might be eliminated by inventing a new stain method in the
future\footnote{If there is no negative silver stain and the densities of
stains represent quantity of proteins well, then positions and quantities at
the postions can be used as a representation into a vector.}. However, since
we could not find an better alternative and the negative effect is consistent,
we can not help but \emph{accept as it is} and boldly assume that the
\emph{negative effect reflect a status of proteins}.

Conceptually we have only to find out a representation of each serum which is
reproducible with some tolerable variations. This concept is acceptable in the
following sense.

When we try to identify some person with his/her photo, even if the picture is
not exactly the same as the very person(for example, hair style, complex etc.
look different when the picture was taken), most of time the job is done
successfully. Whatever objects we observe or estimate, there are variations.
In this point of view, like identifying objects with photos, finger print,
handwriting etc., there are variations of 2D gel image despite that we assume
all the experiments are perfectly accurate. In other words, depending on the
status of a person or the time of experimenting, it is believed that \emph{the
proteins of the serum will change, but acceptable enough to observe the
difference between normal status and abnormal}($\mathbf{Assumption}$ 2$)$. The
2D gel image(dispersed proteins) of a serum might play a role as ''protein
print'' of its donor.

The affect of the second problem could be diminished by transforming the
images with respect to some fixed ''reference'' proteins.

\section{Experimental scheme}

1. Choose a disease.

2. Get serums\footnote{The scheme could be applied to tissues as well.} of the
same number of patients and normal people.

3. Get images of proteins distributed by 2D gel electrophoresis\footnote{You
might use a commercial software such as PDQuest.}.

4. Normalization process: Modify the images by transforming with respect to
some fixed reference proteins.

5. Transform each image into a set of numbers depending on the density of
pixels(\textbf{Note}:\ Each number ranges from 0 to 255)

6. Each set of numbers represents a vector in a Euclidean space whose
dimension is the number of total pixels.

7. Apply a classification method(for example, neural network, support vector
machine, decision tree etc.) to the set of vectors obtained to get a
generalized cut-off.\footnote{This cut-off is used for diagnosis.}

\begin{thebibliography}{9}                                                                                                %

\bibitem[1]{KK}Kim Gene and Kim, MyungHo. ''Application of Support Vector
Machine to detect an association between a disease or trait and multiple SNP
variations'', (http://xxx.lanl.gov/abs/cs.CC/0104015), Apr. 2001

\bibitem[2]{KK2}Ahn Seung-chan, Kim Gene and Kim MyungHo, ''A Note on
Applications of Support Vector Machine'',
(http://xxx.lanl.gov/abs/math.OC/0106166), May 2001
\end{thebibliography}
\end{document}