Finding Bugs in LLVM 8 with PVS-Studio

в 13:43, , рубрики: bugs, c++, clang, compiler, devops, LLVM, open source, pvs-studio, Блог компании PVS-Studio, Компиляторы
PVS-Studio and LLVM 8.0.0

It's been two years since we last checked the code of the LLVM project with PVS-Studio, so let's see if PVS-Studio is still the leader among tools for detecting bugs and security weaknesses. We'll do that by scanning the LLVM 8.0.0 release for new bugs.

The article that must be written

Frankly, I didn't feel like writing this article. It's not much fun talking about the project that we already checked more than once (1, 2, 3). I'd prefer something new instead, but I had no choice.

Every time a new version of LLVM is released or Clang Static Analyzer is updated, we get emails reading along these lines:

Hey, the new version of Clang Static Analyzer got new diagnostics! PVS-Studio seems to be getting less relevant. Clang can detect more bugs than before and is now catching up with PVS-Studio. What'd you say?

To that I'd gladly respond:

We haven't been lazing around either! We've significantly increased PVS-Studio's capabilities, so no worries — we are still the best.

But that's a bad answer, I'm afraid. It offers no proofs, and that's the reason why I'm writing this article. So, I've checked LLVM one more time and found tons of bugs of all kinds. Those that I liked the most will be discussed further. Clang Static Analyzer can't detect these bugs (or makes the process very troublesome) — and we can. And, by the way, it took me only one evening to write all those bugs down.

The article, though, took me several weeks to complete. I just couldn't bring myself to put the gathered material into text :).

By the way, if you wonder what techniques PVS-Studio employs to detect bugs and vulnerabilities, take a look at this post.

New and existing diagnostics

As I already said, the last of the many checks of LLVM was done two years ago, and the bugs found then were fixed by the authors. This article will show a new portion of errors. How come there are new bugs at all? There are three reasons:

  1. The LLVM project is evolving; the authors modify existing code and add new code. Both modified and new parts naturally have new bugs. This fact is a strong argument for running static analysis regularly rather than every now and then. The format of our articles is perfect for showcasing PVS-Studio's capabilities, but it has nothing to do with improving code quality or making bug-fixing less costly. Do use static analysis regularly!
  2. We modify and improve existing diagnostics, enabling the analyzer to detect bugs it wasn't able to spot before.
  3. PVS-Studio has been enhanced with new diagnostics, which didn't exist two years ago. I grouped such warnings into a separate section so that PVS-Studio's progress is seen more distinctly.

Defects found by existing diagnostics

Snippet No. 1: Copy-Paste

static bool ShouldUpgradeX86Intrinsic(Function *F, StringRef Name) {
  if (Name == "addcarryx.u32" || // Added in 8.0
    ....
    Name == "avx512.mask.cvtps2pd.128" || // Added in 7.0
    Name == "avx512.mask.cvtps2pd.256" || // Added in 7.0
    Name == "avx512.cvtusi2sd" || // Added in 7.0
    Name.startswith("avx512.mask.permvar.") || // Added in 7.0     // <=
    Name.startswith("avx512.mask.permvar.") || // Added in 7.0     // <=
    Name == "sse2.pmulu.dq" || // Added in 7.0
    Name == "sse41.pmuldq" || // Added in 7.0
    Name == "avx2.pmulu.dq" || // Added in 7.0
  ....
}

PVS-Studio diagnostic message: V501 [CWE-570] There are identical sub-expressions 'Name.startswith(«avx512.mask.permvar.»)' to the left and to the right of the '||' operator. AutoUpgrade.cpp 73

The occurrence of the «avx512.mask.permvar.» substring is checked twice. The second condition obviously was to check something else, but the programmer forgot to change the copied line.

Snippet No. 2: Typo

enum CXNameRefFlags {
  CXNameRange_WantQualifier = 0x1,
  CXNameRange_WantTemplateArgs = 0x2,
  CXNameRange_WantSinglePiece = 0x4
};

void AnnotateTokensWorker::HandlePostPonedChildCursor(
    CXCursor Cursor, unsigned StartTokenIndex) {
  const auto flags = CXNameRange_WantQualifier | CXNameRange_WantQualifier;
  ....
}

PVS-Studio diagnostic message: V501 There are identical sub-expressions 'CXNameRange_WantQualifier' to the left and to the right of the '|' operator. CIndex.cpp 7245

The named constant CXNameRange_WantQualifier is used twice because of a typo.

Snippet No. 3: Confusion over operator precedence

int PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
  ....
  if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian() ? 1 : 0)
    return 0;
  ....
}

PVS-Studio diagnostic message: V502 [CWE-783] Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '==' operator. PPCTargetTransformInfo.cpp 404

I find this bug very cute. Yes, I know that I have a strange taste :).

As dictated by operator precedence, the original expression is evaluated as follows:

(ISD == ISD::EXTRACT_VECTOR_ELT && (Index == ST->isLittleEndian())) ? 1 : 0

From the practical point of view, though, this condition doesn't make sense as it can be reduced to:

(ISD == ISD::EXTRACT_VECTOR_ELT && Index == ST->isLittleEndian())

This is obviously a bug. It must have been the Index variable that the programmer wanted to check for 0/1. To fix the code, the ternary operator should be enclosed in parentheses:

if (ISD == ISD::EXTRACT_VECTOR_ELT && Index == (ST->isLittleEndian() ? 1 : 0))

The ternary operator is actually very tricky and may lead to logic errors. Use it carefully and don't hesitate to put additional parentheses around it. This subject is discussed in more detail here, in the section «Beware of the ?: operator and enclose it in parentheses».

Snippets No. 4, 5: Null pointer

Init *TGParser::ParseValue(Record *CurRec, RecTy *ItemType, IDParseMode Mode) {
  ....
  TypedInit *LHS = dyn_cast<TypedInit>(Result);
  ....
  LHS = dyn_cast<TypedInit>(
    UnOpInit::get(UnOpInit::CAST, LHS, StringRecTy::get())
      ->Fold(CurRec));
  if (!LHS) {
    Error(PasteLoc, Twine("can't cast '") + LHS->getAsString() +
                    "' to string");
    return nullptr;
  }
  ....
}

PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'LHS' might take place. TGParser.cpp 2152

If the LHS pointer happens to be null, the program is expected to generate a warning. Instead, it will dereference that very null pointer: LHS->getAsString().

It's quite a typical situation for error handlers to contain bugs because developers don't test them properly. Static analyzers check all reachable code no matter how often it's actually executed. This is a good example of how static analysis complements other code testing and protection means.

A similar faulty handler for the RHS pointer is found a bit further: V522 [CWE-476] Dereferencing of the null pointer 'RHS' might take place. TGParser.cpp 2186

Snippet No. 6: Using a pointer after a move

static Expected<bool>
ExtractBlocks(....)
{
  ....
  std::unique_ptr<Module> ProgClone = CloneModule(BD.getProgram(), VMap);
  ....
  BD.setNewProgram(std::move(ProgClone));                                // <=
  MiscompiledFunctions.clear();

  for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {
    Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first);  // <=
    assert(NewF && "Function not found??");
    MiscompiledFunctions.push_back(NewF);
  }
  ....
}

PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'ProgClone' might take place. Miscompilation.cpp 601

The smart pointer ProgClone first releases the object ownership:

BD.setNewProgram(std::move(ProgClone));

In fact, ProgClone has become a null pointer — so, technically, a null pointer gets dereferenced a bit further:

Function *NewF = ProgClone->getFunction(MisCompFunctions[i].first);

But that won't happen! Note that the loop doesn't actually execute at all.

The MiscompiledFunctions container is first cleared:

MiscompiledFunctions.clear();

And then its size is used in the loop condition:

for (unsigned i = 0, e = MisCompFunctions.size(); i != e; ++i) {

Obviously, the loop just won't start. I think it's a bug too, and the code was meant to look somehow differently.

I guess what we see here is that notorious error parity, where one bug acts as a disguise for another :).

Snippet No. 7: Using a pointer after a move

static Expected<bool> TestOptimizer(BugDriver &BD, std::unique_ptr<Module> Test,
                                    std::unique_ptr<Module> Safe) {
  outs() << "  Optimizing functions being tested: ";
  std::unique_ptr<Module> Optimized =
      BD.runPassesOn(Test.get(), BD.getPassesToRun());
  if (!Optimized) {
    errs() << " Error running this sequence of passes"
           << " on the input program!n";
    BD.setNewProgram(std::move(Test));                       // <=
    BD.EmitProgressBitcode(*Test, "pass-error", false);      // <=
    if (Error E = BD.debugOptimizerCrash())
      return std::move(E);
    return false;
  }
  ....
}

PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'Test' might take place. Miscompilation.cpp 709

This one is similar to the previous case. The object's contents are first moved and then it's used as if nothing happened. This error has been growing ever more common after move semantics were added to C++. That's what I like about this language! You are given new ways to shoot yourself in the foot, which means PVS-Studio will always have work to do :).

Snippet No. 8: Null pointer

void FunctionDumper::dump(const PDBSymbolTypeFunctionArg &Symbol) {
  uint32_t TypeId = Symbol.getTypeId();
  auto Type = Symbol.getSession().getSymbolById(TypeId);
  if (Type)
    Printer << "<unknown-type>";
  else
    Type->dump(*this);
}

PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'Type' might take place. PrettyFunctionDumper.cpp 233

Just like error handlers, test functions printing debug data don't usually get adequate test coverage either, and this is one example of that. Instead of helping the user solve their problems, the function is waiting for them to fix it.

Fixed code:

if (Type)
  Type->dump(*this);
else
  Printer << "<unknown-type>";

Snippet No. 9: Null pointer

void SearchableTableEmitter::collectTableEntries(
    GenericTable &Table, const std::vector<Record *> &Items) {
  ....
  RecTy *Ty = resolveTypes(Field.RecType, TI->getType());
  if (!Ty)                                                              // <=
    PrintFatalError(Twine("Field '") + Field.Name + "' of table '" +
                    Table.Name + "' has incompatible type: " +
                    Ty->getAsString() + " vs. " +                       // <=
                    TI->getType()->getAsString());
   ....
}

PVS-Studio diagnostic message: V522 [CWE-476] Dereferencing of the null pointer 'Ty' might take place. SearchableTableEmitter.cpp 614

I don't think you need any comments on this one.

Snippet No. 10: Typo

bool FormatTokenLexer::tryMergeCSharpNullConditionals() {
  ....
  auto &Identifier = *(Tokens.end() - 2);
  auto &Question = *(Tokens.end() - 1);
  ....
  Identifier->ColumnWidth += Question->ColumnWidth;
  Identifier->Type = Identifier->Type;                    // <=
  Tokens.erase(Tokens.end() - 1);
  return true;
}

PVS-Studio diagnostic message: V570 The 'Identifier->Type' variable is assigned to itself. FormatTokenLexer.cpp 249

Assigning a variable to itself is a meaningless operation. The programmer must have meant to do the following:

Identifier->Type = Question->Type;

Snippet No. 11: Suspicious break

void SystemZOperand::print(raw_ostream &OS) const {
  switch (Kind) {
    break;
  case KindToken:
    OS << "Token:" << getToken();
    break;
  case KindReg:
    OS << "Reg:" << SystemZInstPrinter::getRegisterName(getReg());
    break;
  ....
}

PVS-Studio diagnostic message: V622 [CWE-478] Consider inspecting the 'switch' statement. It's possible that the first 'case' operator is missing. SystemZAsmParser.cpp 652

There is a very suspicious break statement at the beginning. Shouldn't there be something else here?

Snippet No. 12: Checking a pointer after dereferencing

InlineCost AMDGPUInliner::getInlineCost(CallSite CS) {
  Function *Callee = CS.getCalledFunction();
  Function *Caller = CS.getCaller();
  TargetTransformInfo &TTI = TTIWP->getTTI(*Callee);

  if (!Callee || Callee->isDeclaration())
    return llvm::InlineCost::getNever("undefined callee");
  ....
}

PVS-Studio diagnostic message: V595 [CWE-476] The 'Callee' pointer was utilized before it was verified against nullptr. Check lines: 172, 174. AMDGPUInline.cpp 172

The Callee pointer is first dereferenced when the getTTI function is called.

And then it turns out that the pointer should be checked for nullptr:

if (!Callee || Callee->isDeclaration())

Too late…

Snippets No. 13 — No....: Checking a pointer after dereferencing

The previous example isn't unique. The same problem is found in this snippet:

static Value *optimizeDoubleFP(CallInst *CI, IRBuilder<> &B,
                               bool isBinary, bool isPrecise = false) {
  ....
  Function *CalleeFn = CI->getCalledFunction();
  StringRef CalleeNm = CalleeFn->getName();                 // <=
  AttributeList CalleeAt = CalleeFn->getAttributes();
  if (CalleeFn && !CalleeFn->isIntrinsic()) {               // <=
  ....
}

PVS-Studio diagnostic message: V595 [CWE-476] The 'CalleeFn' pointer was utilized before it was verified against nullptr. Check lines: 1079, 1081. SimplifyLibCalls.cpp 1079

And this one:

void Sema::InstantiateAttrs(const MultiLevelTemplateArgumentList &TemplateArgs,
                            const Decl *Tmpl, Decl *New,
                            LateInstantiatedAttrVec *LateAttrs,
                            LocalInstantiationScope *OuterMostScope) {
  ....
  NamedDecl *ND = dyn_cast<NamedDecl>(New);
  CXXRecordDecl *ThisContext =
    dyn_cast_or_null<CXXRecordDecl>(ND->getDeclContext());         // <=
  CXXThisScopeRAII ThisScope(*this, ThisContext, Qualifiers(),
                             ND && ND->isCXXInstanceMember());     // <=
  ....
}

PVS-Studio diagnostic message: V595 [CWE-476] The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 532, 534. SemaTemplateInstantiateDecl.cpp 532

And here:

  • V595 [CWE-476] The 'U' pointer was utilized before it was verified against nullptr. Check lines: 404, 407. DWARFFormValue.cpp 404
  • V595 [CWE-476] The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 2149, 2151. SemaTemplateInstantiate.cpp 2149

Then I lost interest in tracking V595 warnings, so I can't tell you if there are other bugs of this type besides the ones shown above. I bet there are.

Snippets No. 17, 18: Suspicious shift

static inline bool processLogicalImmediate(uint64_t Imm, unsigned RegSize,
                                           uint64_t &Encoding) {
  ....
  unsigned Size = RegSize;
  ....
  uint64_t NImms = ~(Size-1) << 1;
  ....
}

PVS-Studio diagnostic message: V629 [CWE-190] Consider inspecting the '~(Size — 1) << 1' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 260

This code might actually be correct, but it does look strange and needs examining.

Suppose the Size variable has the value 16; then the NImms variable is expected to get the following value:

1111111111111111111111111111111111111111111111111111111111100000

But in reality it will get the value:

0000000000000000000000000000000011111111111111111111111111100000

This happens because all the calculations are done on the 32-bit unsigned type, and only then does it get implicitly promoted to uint64_t, with the most significant bits zeroed out.

The problem can be fixed as follows:

uint64_t NImms = ~static_cast<uint64_t>(Size-1) << 1;

Another bug of this type: V629 [CWE-190] Consider inspecting the 'Immr << 6' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. AArch64AddressingModes.h 269

Snippet No. 19: Missing keyword else?

void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands) {
  ....
  if (Op.isReg() && Op.Reg.RegNo == AMDGPU::VCC) {
    // VOP2b (v_add_u32, v_sub_u32 ...) dpp use "vcc" token.
    // Skip it.
    continue;
  } if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {    // <=
    Op.addRegWithFPInputModsOperands(Inst, 2);
  } else if (Op.isDPPCtrl()) {
    Op.addImmOperands(Inst, 1);
  } else if (Op.isImm()) {
    // Handle optional arguments
    OptionalIdx[Op.getImmTy()] = I;
  } else {
    llvm_unreachable("Invalid operand type");
  }
  ....
}

PVS-Studio diagnostic message: V646 [CWE-670] Consider inspecting the application's logic. It's possible that 'else' keyword is missing. AMDGPUAsmParser.cpp 5655

This one is not a bug. Since the then block of the first if statement ends with continue, it doesn't matter if it has the else keyword or not. The behavior will be the same in any case. However, the missing else makes the code less readable and, therefore, potentially dangerous. If continue disappears one day, the behavior will change drastically. I strongly recommend adding else.

Snippet No. 20: Four identical typos

LLVM_DUMP_METHOD void Symbol::dump(raw_ostream &OS) const {
  std::string Result;
  if (isUndefined())
    Result += "(undef) ";
  if (isWeakDefined())
    Result += "(weak-def) ";
  if (isWeakReferenced())
    Result += "(weak-ref) ";
  if (isThreadLocalValue())
    Result += "(tlv) ";
  switch (Kind) {
  case SymbolKind::GlobalSymbol:
    Result + Name.str();                        // <=
    break;
  case SymbolKind::ObjectiveCClass:
    Result + "(ObjC Class) " + Name.str();      // <=
    break;
  case SymbolKind::ObjectiveCClassEHType:
    Result + "(ObjC Class EH) " + Name.str();   // <=
    break;
  case SymbolKind::ObjectiveCInstanceVariable:
    Result + "(ObjC IVar) " + Name.str();       // <=
    break;
  }
  OS << Result;
}

PVS-Studio diagnostic messages:

  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + Name.str()' expression. Symbol.cpp 32
  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC Class) " + Name.str()' expression. Symbol.cpp 35
  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC Class EH) " + Name.str()' expression. Symbol.cpp 38
  • V655 [CWE-480] The strings were concatenated but are not utilized. Consider inspecting the 'Result + "(ObjC IVar) " + Name.str()' expression. Symbol.cpp 41

The programmer accidentally used the + operator instead of += and ended up with four meaningless constructs.

Snippet No. 21: Undefined behavior

static void getReqFeatures(std::map<StringRef, int> &FeaturesMap,
                           const std::vector<Record *> &ReqFeatures) {
  for (auto &R : ReqFeatures) {
    StringRef AsmCondString = R->getValueAsString("AssemblerCondString");

    SmallVector<StringRef, 4> Ops;
    SplitString(AsmCondString, Ops, ",");
    assert(!Ops.empty() && "AssemblerCondString cannot be empty");

    for (auto &Op : Ops) {
      assert(!Op.empty() && "Empty operator");
      if (FeaturesMap.find(Op) == FeaturesMap.end())
        FeaturesMap[Op] = FeaturesMap.size();
    }
  }
}

Try to spot the bug on your own first. I added the image so that you don't peek at the answer right away:

???

PVS-Studio diagnostic message: V708 [CWE-758] Dangerous construction is used: 'FeaturesMap[Op] = FeaturesMap.size()', where 'FeaturesMap' is of 'map' class. This may lead to undefined behavior. RISCVCompressInstEmitter.cpp 490

The faulty line is this one:

FeaturesMap[Op] = FeaturesMap.size();

If the Op element hasn't been found, the program creates a new element in the map and assigns it the total number of elements in this map. You just don't know if the size function will be called before or after adding the new element.

Snippets No. 22 — No. 24: Duplicate assignments

Error MachOObjectFile::checkSymbolTable() const {
  ....
  } else {
    MachO::nlist STE = getSymbolTableEntry(SymDRI);
    NType = STE.n_type;                              // <=
    NType = STE.n_type;                              // <=
    NSect = STE.n_sect;
    NDesc = STE.n_desc;
    NStrx = STE.n_strx;
    NValue = STE.n_value;
  }
  ....
}

PVS-Studio diagnostic message: V519 [CWE-563] The 'NType' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1663, 1664. MachOObjectFile.cpp 1664

I don't think it's a true error — rather a duplicate assignment. But it's still a defect.

Two other cases:

  • V519 [CWE-563] The 'B.NDesc' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1488, 1489. llvm-nm.cpp 1489
  • V519 [CWE-563] The variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 59, 61. coff2yaml.cpp 61

Snippets No. 25 — No. 27: More duplicate assignments

These ones deal with slightly different versions of duplicate assignments.

bool Vectorizer::vectorizeLoadChain(
    ArrayRef<Instruction *> Chain,
    SmallPtrSet<Instruction *, 16> *InstructionsProcessed) {
  ....
  unsigned Alignment = getAlignment(L0);
  ....
  unsigned NewAlign = getOrEnforceKnownAlignment(L0->getPointerOperand(),
                                                 StackAdjustedAlignment,
                                                 DL, L0, nullptr, &DT);
  if (NewAlign != 0)
    Alignment = NewAlign;
  Alignment = NewAlign;
  ....
}

PVS-Studio diagnostic message: V519 [CWE-563] The 'Alignment' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1158, 1160. LoadStoreVectorizer.cpp 1160

This is a very strange snippet, and it probably contains a logic error. The Alignment variable is first assigned the value based on the condition, and then it is assigned the value once again, but without any prior check.

Similar defects:

  • V519 [CWE-563] The 'Effects' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 152, 165. WebAssemblyRegStackify.cpp 165
  • V519 [CWE-563] The 'ExpectNoDerefChunk' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 4970, 4973. SemaType.cpp 4973

Snippet No. 28: Always true condition

static int readPrefixes(struct InternalInstruction* insn) {
  ....
  uint8_t byte = 0;
  uint8_t nextByte;
  ....
  if (byte == 0xf3 && (nextByte == 0x88 || nextByte == 0x89 ||
                       nextByte == 0xc6 || nextByte == 0xc7)) {
    insn->xAcquireRelease = true;
    if (nextByte != 0x90) // PAUSE instruction support             // <=
      break;
  }
  ....
}

PVS-Studio diagnostic message: V547 [CWE-571] Expression 'nextByte != 0x90' is always true. X86DisassemblerDecoder.cpp 379

The check doesn't make sense. The nextByte variable is never equal to 0x90: it just logically follows from the previous check. This must be some logic error.

Snippets No. 29 — No....: Always true/false conditions

There are many warnings about an entire condition (V547) or part of a condition (V560) being always true or false. Rather than genuine bugs, these are often simply bad code, the effects of macro expansion, and so on. That said, all such warnings should still be checked because some of them may point at genuine logic errors. For example, the following snippet doesn't look right:

static DecodeStatus DecodeGPRPairRegisterClass(MCInst &Inst, unsigned RegNo,
                                   uint64_t Address, const void *Decoder) {
  DecodeStatus S = MCDisassembler::Success;

  if (RegNo > 13)
    return MCDisassembler::Fail;

  if ((RegNo & 1) || RegNo == 0xe)
     S = MCDisassembler::SoftFail;
  ....
}

PVS-Studio diagnostic message: V560 [CWE-570] A part of conditional expression is always false: RegNo == 0xe. ARMDisassembler.cpp 939

The 0xE constant is the decimal number 14. The check RegNo == 0xe doesn't make sense because if RegNo > 13, the function will return.

I saw a lot of other V547 and V560 warnings, but, like with V595, I didn't feel excited about checking them since I already had enough material for an article :). So, no figures for the total number of bugs of this type in LLVM.

Here's an example to illustrate why checking those warnings is boring. The analyzer is totally correct when issuing a warning on the following code. But it's still not a bug.

bool UnwrappedLineParser::parseBracedList(bool ContinueOnSemicolons,
                                          tok::TokenKind ClosingBraceKind) {
  bool HasError = false;
  ....
  HasError = true;
  if (!ContinueOnSemicolons)
    return !HasError;
  ....
}

PVS-Studio diagnostic message: V547 [CWE-570] Expression '!HasError' is always false. UnwrappedLineParser.cpp 1635

Snippet No. 30: Suspicious return

static bool
isImplicitlyDef(MachineRegisterInfo &MRI, unsigned Reg) {
  for (MachineRegisterInfo::def_instr_iterator It = MRI.def_instr_begin(Reg),
      E = MRI.def_instr_end(); It != E; ++It) {
    return (*It).isImplicitDef();
  }
  ....
}

PVS-Studio diagnostic message: V612 [CWE-670] An unconditional 'return' within a loop. R600OptimizeVectorRegisters.cpp 63

It's either a bug or a specific coding technique meant to communicate some idea to fellow programmers. To me it doesn't tell anything except that it's a very suspicious piece of code. Please don't write code like that :).

Feeling tired? OK, it's time to make some tea or coffee.

coffee

Defects found by new diagnostics

I think 30 examples is enough for existing diagnostics. Now let's see if we can find anything interesting with the new diagnostics, which were added after the previous check. Over the last two years, the C++ analyzer module was extended with 66 new diagnostics.

Snippet No. 31: Unreachable code

Error CtorDtorRunner::run() {
  ....
  if (auto CtorDtorMap =
          ES.lookup(JITDylibSearchList({{&JD, true}}), std::move(Names),
                    NoDependenciesToRegister, true))
  {
    ....
    return Error::success();
  } else
    return CtorDtorMap.takeError();

  CtorDtorsByPriority.clear();

  return Error::success();
}

PVS-Studio diagnostic message: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. ExecutionUtils.cpp 146

As you can see, both branches of the if statement end with a return statement, which means the CtorDtorsByPriority container will never be cleared.

Snippet No. 32: Unreachable code

bool LLParser::ParseSummaryEntry() {
  ....
  switch (Lex.getKind()) {
  case lltok::kw_gv:
    return ParseGVEntry(SummaryID);
  case lltok::kw_module:
    return ParseModuleEntry(SummaryID);
  case lltok::kw_typeid:
    return ParseTypeIdEntry(SummaryID);                        // <=
    break;                                                     // <=
  default:
    return Error(Lex.getLoc(), "unexpected summary kind");
  }
  Lex.setIgnoreColonInIdentifiers(false);                      // <=
  return false;
}

PVS-Studio diagnostic message: V779 [CWE-561] Unreachable code detected. It is possible that an error is present. LLParser.cpp 835

This one is interesting. Take a look at this part first:

return ParseTypeIdEntry(SummaryID);
break;

There seems to be nothing strange about this code; the break statement is unnecessary and can be safely removed. But it's not that simple.

The warning is triggered by the following lines:

Lex.setIgnoreColonInIdentifiers(false);
return false;

Indeed, this code is unreachable. All the case labels of the switch statement end with a return, and the meaningless lone break doesn't look that harmless anymore! What if one of the branches was meant to end with a break rather than return?

Snippet No. 33: Accidental clearing of the most significant bits

unsigned getStubAlignment() override {
  if (Arch == Triple::systemz)
    return 8;
  else
    return 1;
}

Expected<unsigned>
RuntimeDyldImpl::emitSection(const ObjectFile &Obj,
                             const SectionRef &Section,
                             bool IsCode) {
  ....
  uint64_t DataSize = Section.getSize();
  ....
  if (StubBufSize > 0)
    DataSize &= ~(getStubAlignment() - 1);
  ....
}

PVS-Studio diagnostic message: V784 The size of the bit mask is less than the size of the first operand. This will cause the loss of higher bits. RuntimeDyld.cpp 815

Note that the getStubAlignment function returns an unsigned value. Let's see how the expression will evaluate, assuming that the function will return the value 8:

~(getStubAlignment() — 1)

~(8u-1)

0xFFFFFFF8‬u

Note now that the DataSize variable's type is 64-bit unsigned. So it turns out that executing the operation DataSize & 0xFFFFFFF8 will result in clearing all 32 most significant bits of the value. I don't think the programmer wanted that. Perhaps they meant it to be DataSize & 0xFFFFFFFFFFFFFFF8‬u.

To fix the error, the code should be rewritten like this:

DataSize &= ~(static_cast<uint64_t>(getStubAlignment()) - 1);

Or like this:

DataSize &= ~(getStubAlignment() - 1ULL);

Snippet No. 34: Bad explicit type conversion

template <typename T>
void scaleShuffleMask(int Scale, ArrayRef<T> Mask,
                      SmallVectorImpl<T> &ScaledMask) {
  assert(0 < Scale && "Unexpected scaling factor");
  int NumElts = Mask.size();
  ScaledMask.assign(static_cast<size_t>(NumElts * Scale), -1);
  ....
}

PVS-Studio diagnostic message: V1028 [CWE-190] Possible overflow. Consider casting operands of the 'NumElts * Scale' operator to the 'size_t' type, not the result. X86ISelLowering.h 1577

Explicit type conversion is used to avoid an overflow when multiplying variables of type int. In this case, though, it doesn't work because the multiplication will occur first and only then will the 32-bit result be promoted to type size_t.

Snippet No. 35: Bad copy-paste

Instruction *InstCombiner::visitFCmpInst(FCmpInst &I) {
  ....
  if (!match(Op0, m_PosZeroFP()) && isKnownNeverNaN(Op0, &TLI)) {
    I.setOperand(0, ConstantFP::getNullValue(Op0->getType()));
    return &I;
  }
  if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
    I.setOperand(1, ConstantFP::getNullValue(Op0->getType()));        // <=
    return &I;
  }
  ....
}

V778 [CWE-682] Two similar code fragments were found. Perhaps, this is a typo and 'Op1' variable should be used instead of 'Op0'. InstCombineCompares.cpp 5507

This new cool diagnostic detects situations where a code fragment is written using copy-paste, with all the names changed save one.

Note that all Op0's except one were changed to Op1 in the second block. The code should probably look like this:

if (!match(Op1, m_PosZeroFP()) && isKnownNeverNaN(Op1, &TLI)) {
  I.setOperand(1, ConstantFP::getNullValue(Op1->getType()));
  return &I;
}

Snippet No. 36: Variables mixed up

struct Status {
  unsigned Mask;
  unsigned Mode;

  Status() : Mask(0), Mode(0){};

  Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
    Mode &= Mask;
  };
  ....
};

PVS-Studio diagnostic message: V1001 [CWE-563] The 'Mode' variable is assigned but is not used by the end of the function. SIModeRegister.cpp 48

It's very dangerous to have the same names for function arguments as for class members because you risk mixing them up. What you see here is an example of that. The following expression is meaningless:

Mode &= Mask;

The argument is changed but never used after that. This snippet should probably look like this:

Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
  this->Mode &= Mask;
};

Snippet No. 37: Variables mixed up

class SectionBase {
  ....
  uint64_t Size = 0;
  ....
};

class SymbolTableSection : public SectionBase {
  ....
};

void SymbolTableSection::addSymbol(Twine Name, uint8_t Bind, uint8_t Type,
                                   SectionBase *DefinedIn, uint64_t Value,
                                   uint8_t Visibility, uint16_t Shndx,
                                   uint64_t Size) {
  ....
  Sym.Value = Value;
  Sym.Visibility = Visibility;
  Sym.Size = Size;
  Sym.Index = Symbols.size();
  Symbols.emplace_back(llvm::make_unique<Symbol>(Sym));
  Size += this->EntrySize;
}

PVS-Studio diagnostic message: V1001 [CWE-563] The 'Size' variable is assigned but is not used by the end of the function. Object.cpp 424

This one is similar to the previous example. Correct version:

this->Size += this->EntrySize;

Snippets No. 38 — No. 47: Missing pointer check

We looked at a few examples of the V595 warning a bit earlier. What it detects is a situation when a pointer is first dereferenced and only then checked. The new diagnostic V1004 is the opposite of that, and it detects tons of errors too. It looks for already tested pointers that are not tested again when necessary. Here are a few errors of this type found in LLVM's code.

int getGEPCost(Type *PointeeType, const Value *Ptr,
               ArrayRef<const Value *> Operands) {
  ....
  if (Ptr != nullptr) {                                            // <=
    assert(....);
    BaseGV = dyn_cast<GlobalValue>(Ptr->stripPointerCasts());
  }
  bool HasBaseReg = (BaseGV == nullptr);

  auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType());  // <=
  ....
}

PVS-Studio diagnostic message: V1004 [CWE-476] The 'Ptr' pointer was used unsafely after it was verified against nullptr. Check lines: 729, 738. TargetTransformInfoImpl.h 738

Ptr can be nullptr, which is indicated by the check:

if (Ptr != nullptr)

However, the same pointer is dereferenced without such a check a bit further:

auto PtrSizeBits = DL.getPointerTypeSizeInBits(Ptr->getType());

Another similar case.

llvm::DISubprogram *CGDebugInfo::getFunctionFwdDeclOrStub(GlobalDecl GD,
                                                          bool Stub) {
  ....
  auto *FD = dyn_cast<FunctionDecl>(GD.getDecl());
  SmallVector<QualType, 16> ArgTypes;
  if (FD)                                                                // <=
    for (const ParmVarDecl *Parm : FD->parameters())
      ArgTypes.push_back(Parm->getType());
  CallingConv CC = FD->getType()->castAs<FunctionType>()->getCallConv(); // <=
  ....
}

PVS-Studio diagnostic message: V1004 [CWE-476] The 'FD' pointer was used unsafely after it was verified against nullptr. Check lines: 3228, 3231. CGDebugInfo.cpp 3231

Note the FD pointer. This error is straightforward, so no comments on this one.

One more here:

static void computePolynomialFromPointer(Value &Ptr, Polynomial &Result,
                                         Value *&BasePtr,
                                         const DataLayout &DL) {
  PointerType *PtrTy = dyn_cast<PointerType>(Ptr.getType());
  if (!PtrTy) {                                                   // <=
    Result = Polynomial();
    BasePtr = nullptr;
  }
  unsigned PointerBits =
      DL.getIndexSizeInBits(PtrTy->getPointerAddressSpace());     // <=
  ....
}

PVS-Studio diagnostic message: V1004 [CWE-476] The 'PtrTy' pointer was used unsafely after it was verified against nullptr. Check lines: 960, 965. InterleavedLoadCombinePass.cpp 965

How do you avoid errors like that? Be very careful when reviewing your code and check it regularly with PVS-Studio.

I don't think we should examine other examples of this type, so here's just a list of the warnings:

  • V1004 [CWE-476] The 'Expr' pointer was used unsafely after it was verified against nullptr. Check lines: 1049, 1078. DebugInfoMetadata.cpp 1078
  • V1004 [CWE-476] The 'PI' pointer was used unsafely after it was verified against nullptr. Check lines: 733, 753. LegacyPassManager.cpp 753
  • V1004 [CWE-476] The 'StatepointCall' pointer was used unsafely after it was verified against nullptr. Check lines: 4371, 4379. Verifier.cpp 4379
  • V1004 [CWE-476] The 'RV' pointer was used unsafely after it was verified against nullptr. Check lines: 2263, 2268. TGParser.cpp 2268
  • V1004 [CWE-476] The 'CalleeFn' pointer was used unsafely after it was verified against nullptr. Check lines: 1081, 1096. SimplifyLibCalls.cpp 1096
  • V1004 [CWE-476] The 'TC' pointer was used unsafely after it was verified against nullptr. Check lines: 1819, 1824. Driver.cpp 1824

Snippets No. 48 — No. 60: Not critical but still a defect (potential memory leak)

std::unique_ptr<IRMutator> createISelMutator() {
  ....
  std::vector<std::unique_ptr<IRMutationStrategy>> Strategies;
  Strategies.emplace_back(
      new InjectorIRStrategy(InjectorIRStrategy::getDefaultOps()));
  ....
}

PVS-Studio diagnostic message: V1023 [CWE-460] A pointer without owner is added to the 'Strategies' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-isel-fuzzer.cpp 58

You can't simply write xxx.push_back(new X) to append an element to a container of type std::vector<std::unique_ptr<X>> because there is no implicit cast from X* to std::unique_ptr<X>.

The popular solution is to write xxx.emplace_back(new X) since it is compilable: the emplace_back method constructs the element directly from the arguments and, therefore, can use explicit constructors.

But that solution isn't safe. If the vector is full, memory will be reallocated. This operation may fail and end up raising an std::bad_alloc exception. In this case, the pointer will be lost and the program won't be able to delete the object created.

A safer solution is to create a unique_ptr, which will retain the pointer until the vector attempts to reallocate the memory:

xxx.push_back(std::unique_ptr<X>(new X))

The C++14 standard allows you to use 'std::make_unique':

xxx.push_back(std::make_unique<X>())

This type of defect has no effect in LLVM. Compilation will simply terminate if memory allocation fails. That said, it may be quite critical in applications with a long uptime, which can't simply terminate when a memory allocation failure occurs.

So, even though this code isn't dangerous to LLVM, I thought I should still tell you about this bug pattern and the fact that PVS-Studio can now detect it.

Other similar cases:

  • V1023 [CWE-460] A pointer without owner is added to the 'Passes' container by the 'emplace_back' method. A memory leak will occur in case of an exception. PassManager.h 546
  • V1023 [CWE-460] A pointer without owner is added to the 'AAs' container by the 'emplace_back' method. A memory leak will occur in case of an exception. AliasAnalysis.h 324
  • V1023 [CWE-460] A pointer without owner is added to the 'Entries' container by the 'emplace_back' method. A memory leak will occur in case of an exception. DWARFDebugFrame.cpp 519
  • V1023 [CWE-460] A pointer without owner is added to the 'AllEdges' container by the 'emplace_back' method. A memory leak will occur in case of an exception. CFGMST.h 268
  • V1023 [CWE-460] A pointer without owner is added to the 'VMaps' container by the 'emplace_back' method. A memory leak will occur in case of an exception. SimpleLoopUnswitch.cpp 2012
  • V1023 [CWE-460] A pointer without owner is added to the 'Records' container by the 'emplace_back' method. A memory leak will occur in case of an exception. FDRLogBuilder.h 30
  • V1023 [CWE-460] A pointer without owner is added to the 'PendingSubmodules' container by the 'emplace_back' method. A memory leak will occur in case of an exception. ModuleMap.cpp 810
  • V1023 [CWE-460] A pointer without owner is added to the 'Objects' container by the 'emplace_back' method. A memory leak will occur in case of an exception. DebugMap.cpp 88
  • V1023 [CWE-460] A pointer without owner is added to the 'Strategies' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-isel-fuzzer.cpp 60
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 685
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 686
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 688
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 689
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 690
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 691
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 692
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 693
  • V1023 [CWE-460] A pointer without owner is added to the 'Modifiers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. llvm-stress.cpp 694
  • V1023 [CWE-460] A pointer without owner is added to the 'Operands' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 1911
  • V1023 [CWE-460] A pointer without owner is added to the 'Stash' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 2100
  • V1023 [CWE-460] A pointer without owner is added to the 'Matchers' container by the 'emplace_back' method. A memory leak will occur in case of an exception. GlobalISelEmitter.cpp 2702

Conclusion

I wrote down 60 warnings and stopped at that. Did PVS-Studio find any other bugs in LLVM? Yes, it did. But as I was writing down the examples, night fell, so I decided to knock off.

I hope you enjoyed reading this article and it encouraged you to try the PVS-Studio analyzer for yourself.

Visit this page to download the analyzer and get a trial key.

Most importantly, use static analysis regularly. One-time checks, like those that we do to popularize static analysis and promote PVS-Studio, aren't the normal scenario.

Good luck with improving your code's quality and reliability!

Автор: Andrey2008

Источник

* - обязательные к заполнению поля


https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js