LSE improvement: better singleton array optimization

Rationale:
In a recent LSA/LSE refactoring, we removed the "exceptional"
situation on mismatched arrays from the load-elimination branch
for merged values. As a direct result, we can relax the condition
on removing stores for singleton arrays into return blocks a bit,
as done in this CL (and shown with some tests).

Test: test-art-host,target

Bug: b/77906240

Change-Id: I32c89057168730f82d1d7c41155a9ff71b126204
diff --git a/compiler/optimizing/load_store_elimination.cc b/compiler/optimizing/load_store_elimination.cc
index 35e64f7..28ac942 100644
--- a/compiler/optimizing/load_store_elimination.cc
+++ b/compiler/optimizing/load_store_elimination.cc
@@ -458,8 +458,13 @@
       }
       if (from_all_predecessors) {
         if (ref_info->IsSingletonAndRemovable() &&
-            block->IsSingleReturnOrReturnVoidAllowingPhis()) {
-          // Values in the singleton are not needed anymore.
+            (block->IsSingleReturnOrReturnVoidAllowingPhis() ||
+             (block->EndsWithReturn() && (merged_value != kUnknownHeapValue ||
+                                          merged_store_value != kUnknownHeapValue)))) {
+          // Values in the singleton are not needed anymore:
+          // (1) if this block consists of a sole return, or
+          // (2) if this block returns and a usable merged value is obtained
+          //     (loads prior to the return will always use that value).
         } else if (!IsStore(merged_value)) {
           // We don't track merged value as a store anymore. We have to
           // hold the stores in predecessors live here.