Assignment Writer

MATH661.03

Question 1: Distinction between Absolute Error and Relative Error

Absolute Error: The absolute error quantifies the magnitude of the difference between the true value (xtrue) and the approximate value (xapprox). It is given by the formula:

AbsoluteError = |xtrue xapprox|

It provides a measure of how far the approximate value is from the true value in absolute terms, without considering the scale of the true value.

Relative Error: The relative error expresses the absolute error as a fraction of the true value. It provides a scale-independent measure of error and is calculated

as:

MATH661.03 Final Exam

Relative error is particularly useful when comparing errors across values of different magnitudes, as it normalizes the error relative to the true value.

Key Difference: – Absolute error provides a direct measure of the error’s size. – Relative error contextualizes the error by comparing it to the true value, making it scale-independent.(MATH661.03 )

Question 2: Truncation Error and Floating-Point Arithmetic

If f is a real-valued function of a real variable, the truncation error of the finite difference approximation to the derivative is given as:

MATH661.03 Final Exam

where h is a small step size. As h → 0, the truncation error decreases because the approximation becomes more accurate.

However, using floating-point arithmetic, choosing a very small value of h is not practical due to the following reasons:

  1. Cancellation Error: When h is very small, the terms f(x+h) and f(xh) become almost equal. Subtracting these two nearly equal numbers leads to significant loss of precision due to cancellation error, which occurs because floatingpoint numbers cannot represent every real number exactly.
  2. Amplification of Round-Off Error: In floating-point arithmetic, dividing by a very small h amplifies any existing numerical errors in the computation of f(x + h) and f(x h). This can result in highly unstable and inaccurate results.

Conclusion: While reducing h decreases truncation error, it also increases numerical errors due to floating-point limitations. Therefore, an optimal h should balance truncation error and round-off error to achieve accurate results in practice.(MATH661.03 )

Question 3: Convert the Binary Sequence 1001011 to a Decimal Number

To convert the binary sequence 10010112 to its decimal equivalent:

  1. Each digit in the binary number represents a power of 2, starting from 20 for the rightmost digit. 2. Compute the value of each digit:

10010112 = (1 · 26) + (0 · 25) + (0 · 24) + (1 · 23) + (0 · 22) + (1 · 21) + (1 · 20)

  1. Perform the calculations:

= 64 + 0 + 0 + 8 + 0 + 2 + 1

  1. Add the results:

= 75

Final Answer: The decimal equivalent of 10010112 is 7510.(MATH661.03 )

Question 4: Convert the Decimal Number 21.125 to IEEE Single-Precision Binary Format

To convert the decimal number 21.12510 to IEEE single-precision binary format:

  1. Convert the integer part 2110 to binary:

2110 = 101012

  1. Convert the fractional part 0.12510 to binary: Multiply 0.125 by 2:

0.125 × 2 = 0.25

(integerpart : 0,keepfractionalpart0.25)

0.25 × 2 = 0.5

(integerpart : 0,keepfractionalpart0.5)

0.5 × 2 = 1.0

(integerpart : 1,fractionalpartbecomes0)

So, 0.12510 = 0.0012.

  1. Combine the integer and fractional parts:

21.12510 = 10101.0012

  1. **Normalize the binary number:** Shift the decimal point so there is one digit before the point:

10101.0012 = 1.010100012 × 24

– The exponent is 4.

  1. Determine the sign bit: – 21.125 is positive, so the sign bit is 0.
  2. Encode the exponent: – IEEE single-precision uses a biased exponent (bias = 127).

Exponent = 4 + 127 = 13110 = 100000112

  1. **Determine the mantissa:** – Drop the leading 1 from the normalized

mantissa:

Mantissa = 01010001000000000000000

  1. Combine the components:

IEEESingle PrecisionRepresentation :

                               Sign(1bit)     Exponent(8bits)      Mantissa(23bits)

                                       0   10000011 01010001000000000000000

Final Answer: 21.12510 in IEEE single-precision binary format is:

01000001101010001000000000000000

√(MATH661.03 )

Question 5: Strategy to Compute f(x) =x +1−1 When x = 1.0×10−15

To compute f(x) =        x + 1 − 1 when x = 1.0 × 10−15, we encounter a problem

of numerical instability due to the subtraction of two nearly equal numbers. This leads to significant loss of precision because of cancellation error in floating-point arithmetic.

A better strategy is to rationalize the expression by multiplying the numerator

and denominator by the conjugate of   x + 1 − 1:

MATH661.03 Final Exam

Simplify the numerator:

        Since (     x + 1)2 = x + 1, the numerator becomes:

MATH661.03 Final Exam

The 1 in the numerator cancels out, leaving:

MATH661.03 Final Exam

Now, substitute x = 1.0 × 10−15:

MATH661.03 Final Exam

In this form, the subtraction is avoided, and the calculation becomes numerically stable.

Final Strategy: Use the rationalized form MATH661.03 Final Exam to compute f(x) accurately when x is very small.(MATH661.03 )

Question 6: Perturbation Bound for Linear Systems

Let x be the solution to the nonsingular linear system Ax = b, and x˜ be the solution to the system Ax˜ = b + ∆b with a perturbed right-hand side. Define ∆x = x˜ − x. We aim to show that:

MATH661.03 Final Exam

Proof:

  1. **Perturbed System Relation**: From the original and perturbed systems,we have:

                                                       Ax˜ = b + ∆b     and     Ax = b.

Subtract these equations:

Ax˜ − Ax = ∆b.

Factoring out A, we get:

A(x˜ − x) = ∆b.

Substituting ∆x = x˜ − x, we write:

Ax = ∆b.

  1. **Norm of ∆x**: Taking norms on both sides:

Ax∥ = ∥∆b.

Using the property of matrix norms, ∥Ax∥ ≤ ∥A∥∥∆x∥, we get:

A∥∥∆x∥ ≥ ∥∆b.

Hence:

∥∆x∥ ≤ ∥A−1∥∥∆b.

  1. **Condition Number**: Recall the condition number of A:

cond(A) = ∥A∥∥A−1.

Therefore:

MATH661.03 Final Exam

  1. **Final Bound**: Substituting cond(A) = ∥A∥∥A−1∥, we have:

MATH661.03 Final Exam

Conclusion: This demonstrates the bound on the relative perturbation of the solution in terms of the condition number of A and the relative perturbation of the right-hand side.(MATH661.03 )

Question 7: Householder Transformation

Householder Transformation: The Householder transformation H is a linear transformation that reflects a vector across a plane or hyperplane. It is defined as:

MATH661.03 Final Exam

where v is a vector, I is the identity matrix, and H is a symmetric and orthogonal matrix.

Properties of H: 1. H is symmetric:

MATH661.03 Final Exam

  1. H is orthogonal:

MATH661.03 Final Exam

Hence, H−1 = H.

Find the Householder Matrix H for v = [2,1,2]T : We want Hv = [α,0,0]T , where α ̸= 0.

  1. Define α = ±∥v2, where ∥v2 is the Euclidean norm:

                  √ √

v2 =        22 + 12 + 22 =       9 = 3,      α = −3       (choosenegativesignforsimplicity).

  1. Compute u, the vector used to construct H:

                                                        u = v αe1,       e1 = [1,0,0]T .

Substitute v = [2,1,2]T , α = −3, and e1:

u = [2,1,2]T − (−3)[1,0,0]T = [2 + 3,1,2]T = [5,1,2]T .

  1. Normalize u to construct the reflection vector:

MATH661.03 Final Exam

MATH661.03 Final Exam

  1. Construct the Householder matrix H:

H = I − 2wwT .

Compute wwT :

MATH661.03 Final Exam

Multiply by 2:

MATH661.03 Final Exam

Subtract from I (identity matrix):

MATH661.03 Final Exam

  1. Final Simplified H:

MATH661.03 Final Exam

With this H, we have Hv = [−3,0,0]T , as required.(MATH661.03 )

“If you’re looking for expert guidance on assignments related to MATH661.03, our team at Assignment Writer provides reliable support tailored to your academic needs. Additionally, for quick and efficient evaluation of your work, try our innovative tool at Check My Assignment, designed to help students refine their submissions.”