Halting Problem
The halting problem is a decision problem in computability theory. It asks, given a computer program and an input, will the program terminate or will it run forever? For example, consider the following Python program:
1 2 3 

It reads the input, and if it's not empty, the program will loop forever. Thus, if the input is empty, the program will terminate and the answer to this specific question is "yes, this program on the empty input will terminate", and if the input isn't empty, the program will loop forever and the answer is "no, this program on this input will not terminate".
Halting problem is perhaps the most wellknown problem that has been proven to be undecidable; that is, there is no program that can solve the halting problem for general enough computer programs. It's important to specify what kind of computer programs we're talking about. In the above case, it's a Python program, but in computation theory, people often use Turing machines which are proven to be as strong as "usual computers". In 1936, Alan Turing proved that the halting problem over Turing machines is undecidable using a Turing machine; that is, no Turing machine can decide correctly (terminate and produce the correct answer) for all possible program/input pairs.
Contents
Definition
The decision problem $H$, for Halting problem, is the set of all $\{\langle p, x \rangle  \text{program \$p\$ halts on input \$x\$}\}$, for an appropriate definition of "program" (usually "Turing machine"), and where $\langle \cdot \rangle$ denotes some kind of encoding. A Turing machine $A$ solves $H$ if, given any input $\langle p, x \rangle$, it terminates in an accepting state if $\langle p, x \rangle \in H$, and terminates in a rejecting state otherwise. Note that it doesn't matter whether $p$ terminates because it accepts, or because it rejects, or because it gets some kind of error; as long as it terminates and doesn't loop forever, $A$ should accept it.
Halting problem is undecidable
Proof by contradiction
Suppose there exists a Turing machine $A$ that decides $H$. Now consider a Turing machine $B$ defined as follows: it takes an input $\langle p \rangle$, runs $A$ on input $\langle p, \langle p \rangle \rangle$, and halts if and only if $A$ rejects. That is, $B$ takes a program $p$, runs the supposed Turing machine that decides the halting problem on the input "program $p$, input $\langle p \rangle$"; based on the outcome, if it says "yes, $p$ on input $\langle p \rangle$ halts", then it loops forever, otherwise if it says "no, $p$ on input $\langle p \rangle$ doesn't halt", then it terminates. Note that $\langle p \rangle$ is the encoding of a program $p$ into a suitable input (bitstring or natural number), and hence can be accepted as input by another program.
Consider what happens when $B$ receives $\langle B \rangle$ as input. It runs $A$ on the input $\langle B, \langle B \rangle \rangle$. We now have two cases:
 $A$ accepts. This means $B$ on input $\langle B \rangle$ halts. But by definition of $B$, if $A$ accepts we will enter an infinite loop, and so $B$ on input $\langle B \rangle$ doesn't halt after all.
 $A$ rejects. This means $B$ on input $\langle B \rangle$ doesn't halt. But by definition of $B$, if $A$ rejects we will terminate, and so $B$ on input $\langle B \rangle$ halts after all.
In both cases, we derive a contradiction. This contradiction happens because we assumed the existence of $A$; its existence allows us to create $B$ that behaves incorrectly. Thus $A$ cannot exist, and so no Turing machine can decide $H$.
Realworld Example ^{[1]}
One way to visualize this is to think of apps on a phone. Apps are types of Turing machines. Sometimes apps crash your phone because they get caught in a loop and do not halt. Let’s supposes a clever team releases an app to check for this. This app, the Checker app can solve the Halting problem.
The Checker app checks some app $M$. If $M$ halts, then $Checker(M)$ accepts, this app will not crash your phone. If $M$ loops, then $Checker(M)$ rejects, this app will crash your phone.
Suppose a diabolical computer scientist decides to create a App called Paradox. This app will effectively reverse the Checker app. And because the Checker app works, this one works too, it's really simple:  It turns on the Checker app, and inputs itself into the Checker app.  If the Checker returns an accept, then the Paradox app forces the phone to loop and crash.  If the Checker app returns a reject, then the Paradox app will halt, meaning that it doesn't deserve that rejection, it's a safe app and doesn't crash your phone.
Effectively to run Paradox means to run Paradox(Checker(Paradox)). If Paradox is a bad app, that crashes your phone, Checker will reject it, and Paradox will return a halt. Paradox(Checker(Paradox)) = Paradox(Checker(loop)) = Paradox(reject) = Halt If Paradox is a good app, that halts, then the Checker app will accept it, and Paradox will crash your phone. Paradox(Checker(Paradox)) = Paradox(Checker(halt)) = Paradox(accept) = Loop So the Checker app says Paradox is good, and then Paradox crashes you, or the Checker app says Paradox is bad and Paradox halts. This is a contradiction.
Now some people still don't see this as a contradiction. To which we'd point out that this is a supertask, which we know to be impossible in the real world. A supertask is an uncountable series of infinite tasks, examples include Thompson's lamp or zeno's paradox. We know that running Paradox actually means Paradox(Checker(Paradox)) which means Paradox(Checker(Paradox(Checker(Paradox)))) which means Paradox(Checker(Paradox(Checker(Paradox(Checker(Paradox)))))) and so on. Either the Paradox app does crash your phone or it doesn't. Those are the only two possibilities. But this series of infinite checks switch back and forth between loop and halt, with no result.
Consider the following algorithm which when fed with another algorithm and an input, tells if the program halts:
 Simulate the first step of the algorithm given.
 If the program halts, return
Yes, the algorithm halts
and terminate.  If it does not halt (as of yet), capture a snapshot of the simulation.
 Compare the snapshot with the previous snapshots. If it is the same as one of the snapshots previously taken, return
No, the algorithm does not halt
and terminate.  Simulate the next step.
 Go to 2.
Now which of the following is true?
A. The construction is correct; The halting problem is only undecidable for computers with infinite memory
B. The construction is wrong; Snapshots of a simulation cannot be captured by a computer program
C. The construction is wrong; The simulation might run into a previously encountered snapshot but still halt.
D. The construction is correct; It solves the halting problem in general for any computer
E. The construction is wrong; The described construction cannot be written as a finite program in a proper computer.
Note: The construction refers to the algorithm described, which is the one that is supposed to solve the halting problem.
Reductions and Consequences
A typical way to prove that a problem is undecidable is to use reduction. If a problem can be reduced to the halting problem, it is undecidable.
A problem $A$ is reducible to problem $B$ if a solution to $B$ could be used to solve $A$. If $A$ has been proven to be an undecidable problem, to prove that a new problem $B$ is undecidable, it is sufficient to show that a solution to $B$ could be used to decide $A$. This yields a contradiction since it was already proven that $A$ is undecidable, and therefore, $B$ is also undecidable.
If we could solve the Busy Beaver problem, we could solve the halting problem.
The Busy Beaver problem is the problem of determining the maximum number of steps an $n$ state Turing machine will take before halting.
If we had a function that could compute the Busy Beaver function, $BB(n)$, we could know the maximum number of steps any Turing machine will take before halting. This means that we could know how long we would have to wait for a machine to halt, and ultimately, we will know if the machine will halt. In other words, if we had a way to compute the Busy Beaver function, we could use it to solve the halting problem. Since the halting problem is uncomputable, it follows that the Busy Beaver function is uncomputable.
If the halting problem could be solved, many other problems could be decided:
 Goldbach’s conjecture could be decided. It's easy to construct a Turing machine that tests every even natural number greater than 2 on whether it's the sum of two primes or not; if it encounters any counterexample, it immediately halts and reports that a counterexample has been found, otherwise it will run forever. If Halting problem were decidable, we could decide whether this program would halt or not, and thus give an answer to Goldbach's conjecture.
 Kolmogorov complexity would be computable.
 The Busy Beaver function would also be computable.
It is useful to know about the kinds of problems that are undecidable because it helps us to understand the limitations of our computation models.
Other models
It's important to note that Halting problem depends on what programs we're considering. The halting problem on Turing machines is undecidable. Conversely, the halting problem on finite state automata is easily decidable; all finite state automata halt. Thus it's important to specify the model.
The halting problem on usual computers is also decidable. To see this, note that there are a finite number of bits in the memory, and thus a finite number of possible configurations the computer can be in. If a program ever repeats a configuration, it will never terminate. Thus a Turing machine, with infinite memory, can simulate the program. By Pigeonhole Principle, if there are $N$ configurations that the program can be in, but we have simulated the program for $N+1$ steps, we must have visited a configuration twice and thus the program will never terminate. So halting problem for usual computers is also decidable using a Turing machine.
References
 Husfeldt, T. TheFreezeAppDoesNotExist. Retrieved April, 24 2016, from https://thorehusfeldt.net/2012/06/25/thefreezeappdoesnotexist/