Subversion Repositories QNX 8.QNX8 LLVM/Clang compiler suite

Rev

Blame | Last modification | View Log | Download | RSS feed

  1. //===- SampleProfReader.h - Read LLVM sample profile data -------*- C++ -*-===//
  2. //
  3. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
  4. // See https://llvm.org/LICENSE.txt for license information.
  5. // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
  6. //
  7. //===----------------------------------------------------------------------===//
  8. //
  9. // This file contains definitions needed for reading sample profiles.
  10. //
  11. // NOTE: If you are making changes to this file format, please remember
  12. //       to document them in the Clang documentation at
  13. //       tools/clang/docs/UsersManual.rst.
  14. //
  15. // Text format
  16. // -----------
  17. //
  18. // Sample profiles are written as ASCII text. The file is divided into
  19. // sections, which correspond to each of the functions executed at runtime.
  20. // Each section has the following format
  21. //
  22. //     function1:total_samples:total_head_samples
  23. //      offset1[.discriminator]: number_of_samples [fn1:num fn2:num ... ]
  24. //      offset2[.discriminator]: number_of_samples [fn3:num fn4:num ... ]
  25. //      ...
  26. //      offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ]
  27. //      offsetA[.discriminator]: fnA:num_of_total_samples
  28. //       offsetA1[.discriminator]: number_of_samples [fn7:num fn8:num ... ]
  29. //       ...
  30. //      !CFGChecksum: num
  31. //      !Attribute: flags
  32. //
  33. // This is a nested tree in which the indentation represents the nesting level
  34. // of the inline stack. There are no blank lines in the file. And the spacing
  35. // within a single line is fixed. Additional spaces will result in an error
  36. // while reading the file.
  37. //
  38. // Any line starting with the '#' character is completely ignored.
  39. //
  40. // Inlined calls are represented with indentation. The Inline stack is a
  41. // stack of source locations in which the top of the stack represents the
  42. // leaf function, and the bottom of the stack represents the actual
  43. // symbol to which the instruction belongs.
  44. //
  45. // Function names must be mangled in order for the profile loader to
  46. // match them in the current translation unit. The two numbers in the
  47. // function header specify how many total samples were accumulated in the
  48. // function (first number), and the total number of samples accumulated
  49. // in the prologue of the function (second number). This head sample
  50. // count provides an indicator of how frequently the function is invoked.
  51. //
  52. // There are three types of lines in the function body.
  53. //
  54. // * Sampled line represents the profile information of a source location.
  55. // * Callsite line represents the profile information of a callsite.
  56. // * Metadata line represents extra metadata of the function.
  57. //
  58. // Each sampled line may contain several items. Some are optional (marked
  59. // below):
  60. //
  61. // a. Source line offset. This number represents the line number
  62. //    in the function where the sample was collected. The line number is
  63. //    always relative to the line where symbol of the function is
  64. //    defined. So, if the function has its header at line 280, the offset
  65. //    13 is at line 293 in the file.
  66. //
  67. //    Note that this offset should never be a negative number. This could
  68. //    happen in cases like macros. The debug machinery will register the
  69. //    line number at the point of macro expansion. So, if the macro was
  70. //    expanded in a line before the start of the function, the profile
  71. //    converter should emit a 0 as the offset (this means that the optimizers
  72. //    will not be able to associate a meaningful weight to the instructions
  73. //    in the macro).
  74. //
  75. // b. [OPTIONAL] Discriminator. This is used if the sampled program
  76. //    was compiled with DWARF discriminator support
  77. //    (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators).
  78. //    DWARF discriminators are unsigned integer values that allow the
  79. //    compiler to distinguish between multiple execution paths on the
  80. //    same source line location.
  81. //
  82. //    For example, consider the line of code ``if (cond) foo(); else bar();``.
  83. //    If the predicate ``cond`` is true 80% of the time, then the edge
  84. //    into function ``foo`` should be considered to be taken most of the
  85. //    time. But both calls to ``foo`` and ``bar`` are at the same source
  86. //    line, so a sample count at that line is not sufficient. The
  87. //    compiler needs to know which part of that line is taken more
  88. //    frequently.
  89. //
  90. //    This is what discriminators provide. In this case, the calls to
  91. //    ``foo`` and ``bar`` will be at the same line, but will have
  92. //    different discriminator values. This allows the compiler to correctly
  93. //    set edge weights into ``foo`` and ``bar``.
  94. //
  95. // c. Number of samples. This is an integer quantity representing the
  96. //    number of samples collected by the profiler at this source
  97. //    location.
  98. //
  99. // d. [OPTIONAL] Potential call targets and samples. If present, this
  100. //    line contains a call instruction. This models both direct and
  101. //    number of samples. For example,
  102. //
  103. //      130: 7  foo:3  bar:2  baz:7
  104. //
  105. //    The above means that at relative line offset 130 there is a call
  106. //    instruction that calls one of ``foo()``, ``bar()`` and ``baz()``,
  107. //    with ``baz()`` being the relatively more frequently called target.
  108. //
  109. // Each callsite line may contain several items. Some are optional.
  110. //
  111. // a. Source line offset. This number represents the line number of the
  112. //    callsite that is inlined in the profiled binary.
  113. //
  114. // b. [OPTIONAL] Discriminator. Same as the discriminator for sampled line.
  115. //
  116. // c. Number of samples. This is an integer quantity representing the
  117. //    total number of samples collected for the inlined instance at this
  118. //    callsite
  119. //
  120. // Metadata line can occur in lines with one indent only, containing extra
  121. // information for the top-level function. Furthermore, metadata can only
  122. // occur after all the body samples and callsite samples.
  123. // Each metadata line may contain a particular type of metadata, marked by
  124. // the starting characters annotated with !. We process each metadata line
  125. // independently, hence each metadata line has to form an independent piece
  126. // of information that does not require cross-line reference.
  127. // We support the following types of metadata:
  128. //
  129. // a. CFG Checksum (a.k.a. function hash):
  130. //   !CFGChecksum: 12345
  131. // b. CFG Checksum (see ContextAttributeMask):
  132. //   !Atribute: 1
  133. //
  134. //
  135. // Binary format
  136. // -------------
  137. //
  138. // This is a more compact encoding. Numbers are encoded as ULEB128 values
  139. // and all strings are encoded in a name table. The file is organized in
  140. // the following sections:
  141. //
  142. // MAGIC (uint64_t)
  143. //    File identifier computed by function SPMagic() (0x5350524f463432ff)
  144. //
  145. // VERSION (uint32_t)
  146. //    File format version number computed by SPVersion()
  147. //
  148. // SUMMARY
  149. //    TOTAL_COUNT (uint64_t)
  150. //        Total number of samples in the profile.
  151. //    MAX_COUNT (uint64_t)
  152. //        Maximum value of samples on a line.
  153. //    MAX_FUNCTION_COUNT (uint64_t)
  154. //        Maximum number of samples at function entry (head samples).
  155. //    NUM_COUNTS (uint64_t)
  156. //        Number of lines with samples.
  157. //    NUM_FUNCTIONS (uint64_t)
  158. //        Number of functions with samples.
  159. //    NUM_DETAILED_SUMMARY_ENTRIES (size_t)
  160. //        Number of entries in detailed summary
  161. //    DETAILED_SUMMARY
  162. //        A list of detailed summary entry. Each entry consists of
  163. //        CUTOFF (uint32_t)
  164. //            Required percentile of total sample count expressed as a fraction
  165. //            multiplied by 1000000.
  166. //        MIN_COUNT (uint64_t)
  167. //            The minimum number of samples required to reach the target
  168. //            CUTOFF.
  169. //        NUM_COUNTS (uint64_t)
  170. //            Number of samples to get to the desrired percentile.
  171. //
  172. // NAME TABLE
  173. //    SIZE (uint32_t)
  174. //        Number of entries in the name table.
  175. //    NAMES
  176. //        A NUL-separated list of SIZE strings.
  177. //
  178. // FUNCTION BODY (one for each uninlined function body present in the profile)
  179. //    HEAD_SAMPLES (uint64_t) [only for top-level functions]
  180. //        Total number of samples collected at the head (prologue) of the
  181. //        function.
  182. //        NOTE: This field should only be present for top-level functions
  183. //              (i.e., not inlined into any caller). Inlined function calls
  184. //              have no prologue, so they don't need this.
  185. //    NAME_IDX (uint32_t)
  186. //        Index into the name table indicating the function name.
  187. //    SAMPLES (uint64_t)
  188. //        Total number of samples collected in this function.
  189. //    NRECS (uint32_t)
  190. //        Total number of sampling records this function's profile.
  191. //    BODY RECORDS
  192. //        A list of NRECS entries. Each entry contains:
  193. //          OFFSET (uint32_t)
  194. //            Line offset from the start of the function.
  195. //          DISCRIMINATOR (uint32_t)
  196. //            Discriminator value (see description of discriminators
  197. //            in the text format documentation above).
  198. //          SAMPLES (uint64_t)
  199. //            Number of samples collected at this location.
  200. //          NUM_CALLS (uint32_t)
  201. //            Number of non-inlined function calls made at this location. In the
  202. //            case of direct calls, this number will always be 1. For indirect
  203. //            calls (virtual functions and function pointers) this will
  204. //            represent all the actual functions called at runtime.
  205. //          CALL_TARGETS
  206. //            A list of NUM_CALLS entries for each called function:
  207. //               NAME_IDX (uint32_t)
  208. //                  Index into the name table with the callee name.
  209. //               SAMPLES (uint64_t)
  210. //                  Number of samples collected at the call site.
  211. //    NUM_INLINED_FUNCTIONS (uint32_t)
  212. //      Number of callees inlined into this function.
  213. //    INLINED FUNCTION RECORDS
  214. //      A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
  215. //      callees.
  216. //        OFFSET (uint32_t)
  217. //          Line offset from the start of the function.
  218. //        DISCRIMINATOR (uint32_t)
  219. //          Discriminator value (see description of discriminators
  220. //          in the text format documentation above).
  221. //        FUNCTION BODY
  222. //          A FUNCTION BODY entry describing the inlined function.
  223. //===----------------------------------------------------------------------===//
  224.  
  225. #ifndef LLVM_PROFILEDATA_SAMPLEPROFREADER_H
  226. #define LLVM_PROFILEDATA_SAMPLEPROFREADER_H
  227.  
  228. #include "llvm/ADT/SmallVector.h"
  229. #include "llvm/ADT/StringRef.h"
  230. #include "llvm/IR/DiagnosticInfo.h"
  231. #include "llvm/IR/LLVMContext.h"
  232. #include "llvm/IR/ProfileSummary.h"
  233. #include "llvm/ProfileData/GCOV.h"
  234. #include "llvm/ProfileData/SampleProf.h"
  235. #include "llvm/Support/Debug.h"
  236. #include "llvm/Support/Discriminator.h"
  237. #include "llvm/Support/ErrorOr.h"
  238. #include "llvm/Support/MemoryBuffer.h"
  239. #include "llvm/Support/SymbolRemappingReader.h"
  240. #include <cstdint>
  241. #include <list>
  242. #include <memory>
  243. #include <optional>
  244. #include <string>
  245. #include <system_error>
  246. #include <unordered_set>
  247. #include <vector>
  248.  
  249. namespace llvm {
  250.  
  251. class raw_ostream;
  252. class Twine;
  253.  
  254. namespace sampleprof {
  255.  
  256. class SampleProfileReader;
  257.  
  258. /// SampleProfileReaderItaniumRemapper remaps the profile data from a
  259. /// sample profile data reader, by applying a provided set of equivalences
  260. /// between components of the symbol names in the profile.
  261. class SampleProfileReaderItaniumRemapper {
  262. public:
  263.   SampleProfileReaderItaniumRemapper(std::unique_ptr<MemoryBuffer> B,
  264.                                      std::unique_ptr<SymbolRemappingReader> SRR,
  265.                                      SampleProfileReader &R)
  266.       : Buffer(std::move(B)), Remappings(std::move(SRR)), Reader(R) {
  267.     assert(Remappings && "Remappings cannot be nullptr");
  268.   }
  269.  
  270.   /// Create a remapper from the given remapping file. The remapper will
  271.   /// be used for profile read in by Reader.
  272.   static ErrorOr<std::unique_ptr<SampleProfileReaderItaniumRemapper>>
  273.   create(const std::string Filename, SampleProfileReader &Reader,
  274.          LLVMContext &C);
  275.  
  276.   /// Create a remapper from the given Buffer. The remapper will
  277.   /// be used for profile read in by Reader.
  278.   static ErrorOr<std::unique_ptr<SampleProfileReaderItaniumRemapper>>
  279.   create(std::unique_ptr<MemoryBuffer> &B, SampleProfileReader &Reader,
  280.          LLVMContext &C);
  281.  
  282.   /// Apply remappings to the profile read by Reader.
  283.   void applyRemapping(LLVMContext &Ctx);
  284.  
  285.   bool hasApplied() { return RemappingApplied; }
  286.  
  287.   /// Insert function name into remapper.
  288.   void insert(StringRef FunctionName) { Remappings->insert(FunctionName); }
  289.  
  290.   /// Query whether there is equivalent in the remapper which has been
  291.   /// inserted.
  292.   bool exist(StringRef FunctionName) {
  293.     return Remappings->lookup(FunctionName);
  294.   }
  295.  
  296.   /// Return the equivalent name in the profile for \p FunctionName if
  297.   /// it exists.
  298.   std::optional<StringRef> lookUpNameInProfile(StringRef FunctionName);
  299.  
  300. private:
  301.   // The buffer holding the content read from remapping file.
  302.   std::unique_ptr<MemoryBuffer> Buffer;
  303.   std::unique_ptr<SymbolRemappingReader> Remappings;
  304.   // Map remapping key to the name in the profile. By looking up the
  305.   // key in the remapper, a given new name can be mapped to the
  306.   // cannonical name using the NameMap.
  307.   DenseMap<SymbolRemappingReader::Key, StringRef> NameMap;
  308.   // The Reader the remapper is servicing.
  309.   SampleProfileReader &Reader;
  310.   // Indicate whether remapping has been applied to the profile read
  311.   // by Reader -- by calling applyRemapping.
  312.   bool RemappingApplied = false;
  313. };
  314.  
  315. /// Sample-based profile reader.
  316. ///
  317. /// Each profile contains sample counts for all the functions
  318. /// executed. Inside each function, statements are annotated with the
  319. /// collected samples on all the instructions associated with that
  320. /// statement.
  321. ///
  322. /// For this to produce meaningful data, the program needs to be
  323. /// compiled with some debug information (at minimum, line numbers:
  324. /// -gline-tables-only). Otherwise, it will be impossible to match IR
  325. /// instructions to the line numbers collected by the profiler.
  326. ///
  327. /// From the profile file, we are interested in collecting the
  328. /// following information:
  329. ///
  330. /// * A list of functions included in the profile (mangled names).
  331. ///
  332. /// * For each function F:
  333. ///   1. The total number of samples collected in F.
  334. ///
  335. ///   2. The samples collected at each line in F. To provide some
  336. ///      protection against source code shuffling, line numbers should
  337. ///      be relative to the start of the function.
  338. ///
  339. /// The reader supports two file formats: text and binary. The text format
  340. /// is useful for debugging and testing, while the binary format is more
  341. /// compact and I/O efficient. They can both be used interchangeably.
  342. class SampleProfileReader {
  343. public:
  344.   SampleProfileReader(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
  345.                       SampleProfileFormat Format = SPF_None)
  346.       : Profiles(0), Ctx(C), Buffer(std::move(B)), Format(Format) {}
  347.  
  348.   virtual ~SampleProfileReader() = default;
  349.  
  350.   /// Read and validate the file header.
  351.   virtual std::error_code readHeader() = 0;
  352.  
  353.   /// Set the bits for FS discriminators. Parameter Pass specify the sequence
  354.   /// number, Pass == i is for the i-th round of adding FS discriminators.
  355.   /// Pass == 0 is for using base discriminators.
  356.   void setDiscriminatorMaskedBitFrom(FSDiscriminatorPass P) {
  357.     MaskedBitFrom = getFSPassBitEnd(P);
  358.   }
  359.  
  360.   /// Get the bitmask the discriminators: For FS profiles, return the bit
  361.   /// mask for this pass. For non FS profiles, return (unsigned) -1.
  362.   uint32_t getDiscriminatorMask() const {
  363.     if (!ProfileIsFS)
  364.       return 0xFFFFFFFF;
  365.     assert((MaskedBitFrom != 0) && "MaskedBitFrom is not set properly");
  366.     return getN1Bits(MaskedBitFrom);
  367.   }
  368.  
  369.   /// The interface to read sample profiles from the associated file.
  370.   std::error_code read() {
  371.     if (std::error_code EC = readImpl())
  372.       return EC;
  373.     if (Remapper)
  374.       Remapper->applyRemapping(Ctx);
  375.     FunctionSamples::UseMD5 = useMD5();
  376.     return sampleprof_error::success;
  377.   }
  378.  
  379.   /// The implementaion to read sample profiles from the associated file.
  380.   virtual std::error_code readImpl() = 0;
  381.  
  382.   /// Print the profile for \p FContext on stream \p OS.
  383.   void dumpFunctionProfile(SampleContext FContext, raw_ostream &OS = dbgs());
  384.  
  385.   /// Collect functions with definitions in Module M. For reader which
  386.   /// support loading function profiles on demand, return true when the
  387.   /// reader has been given a module. Always return false for reader
  388.   /// which doesn't support loading function profiles on demand.
  389.   virtual bool collectFuncsFromModule() { return false; }
  390.  
  391.   /// Print all the profiles on stream \p OS.
  392.   void dump(raw_ostream &OS = dbgs());
  393.  
  394.   /// Print all the profiles on stream \p OS in the JSON format.
  395.   void dumpJson(raw_ostream &OS = dbgs());
  396.  
  397.   /// Return the samples collected for function \p F.
  398.   FunctionSamples *getSamplesFor(const Function &F) {
  399.     // The function name may have been updated by adding suffix. Call
  400.     // a helper to (optionally) strip off suffixes so that we can
  401.     // match against the original function name in the profile.
  402.     StringRef CanonName = FunctionSamples::getCanonicalFnName(F);
  403.     return getSamplesFor(CanonName);
  404.   }
  405.  
  406.   /// Return the samples collected for function \p F, create empty
  407.   /// FunctionSamples if it doesn't exist.
  408.   FunctionSamples *getOrCreateSamplesFor(const Function &F) {
  409.     std::string FGUID;
  410.     StringRef CanonName = FunctionSamples::getCanonicalFnName(F);
  411.     CanonName = getRepInFormat(CanonName, useMD5(), FGUID);
  412.     auto It = Profiles.find(CanonName);
  413.     if (It != Profiles.end())
  414.       return &It->second;
  415.     if (!FGUID.empty()) {
  416.       assert(useMD5() && "New name should only be generated for md5 profile");
  417.       CanonName = *MD5NameBuffer.insert(FGUID).first;
  418.     }
  419.     return &Profiles[CanonName];
  420.   }
  421.  
  422.   /// Return the samples collected for function \p F.
  423.   virtual FunctionSamples *getSamplesFor(StringRef Fname) {
  424.     std::string FGUID;
  425.     Fname = getRepInFormat(Fname, useMD5(), FGUID);
  426.     auto It = Profiles.find(Fname);
  427.     if (It != Profiles.end())
  428.       return &It->second;
  429.  
  430.     if (Remapper) {
  431.       if (auto NameInProfile = Remapper->lookUpNameInProfile(Fname)) {
  432.         auto It = Profiles.find(*NameInProfile);
  433.         if (It != Profiles.end())
  434.           return &It->second;
  435.       }
  436.     }
  437.     return nullptr;
  438.   }
  439.  
  440.   /// Return all the profiles.
  441.   SampleProfileMap &getProfiles() { return Profiles; }
  442.  
  443.   /// Report a parse error message.
  444.   void reportError(int64_t LineNumber, const Twine &Msg) const {
  445.     Ctx.diagnose(DiagnosticInfoSampleProfile(Buffer->getBufferIdentifier(),
  446.                                              LineNumber, Msg));
  447.   }
  448.  
  449.   /// Create a sample profile reader appropriate to the file format.
  450.   /// Create a remapper underlying if RemapFilename is not empty.
  451.   /// Parameter P specifies the FSDiscriminatorPass.
  452.   static ErrorOr<std::unique_ptr<SampleProfileReader>>
  453.   create(const std::string Filename, LLVMContext &C,
  454.          FSDiscriminatorPass P = FSDiscriminatorPass::Base,
  455.          const std::string RemapFilename = "");
  456.  
  457.   /// Create a sample profile reader from the supplied memory buffer.
  458.   /// Create a remapper underlying if RemapFilename is not empty.
  459.   /// Parameter P specifies the FSDiscriminatorPass.
  460.   static ErrorOr<std::unique_ptr<SampleProfileReader>>
  461.   create(std::unique_ptr<MemoryBuffer> &B, LLVMContext &C,
  462.          FSDiscriminatorPass P = FSDiscriminatorPass::Base,
  463.          const std::string RemapFilename = "");
  464.  
  465.   /// Return the profile summary.
  466.   ProfileSummary &getSummary() const { return *(Summary.get()); }
  467.  
  468.   MemoryBuffer *getBuffer() const { return Buffer.get(); }
  469.  
  470.   /// \brief Return the profile format.
  471.   SampleProfileFormat getFormat() const { return Format; }
  472.  
  473.   /// Whether input profile is based on pseudo probes.
  474.   bool profileIsProbeBased() const { return ProfileIsProbeBased; }
  475.  
  476.   /// Whether input profile is fully context-sensitive.
  477.   bool profileIsCS() const { return ProfileIsCS; }
  478.  
  479.   /// Whether input profile contains ShouldBeInlined contexts.
  480.   bool profileIsPreInlined() const { return ProfileIsPreInlined; }
  481.  
  482.   virtual std::unique_ptr<ProfileSymbolList> getProfileSymbolList() {
  483.     return nullptr;
  484.   };
  485.  
  486.   /// It includes all the names that have samples either in outline instance
  487.   /// or inline instance.
  488.   virtual std::vector<StringRef> *getNameTable() { return nullptr; }
  489.   virtual bool dumpSectionInfo(raw_ostream &OS = dbgs()) { return false; };
  490.  
  491.   /// Return whether names in the profile are all MD5 numbers.
  492.   virtual bool useMD5() { return false; }
  493.  
  494.   /// Don't read profile without context if the flag is set. This is only meaningful
  495.   /// for ExtBinary format.
  496.   virtual void setSkipFlatProf(bool Skip) {}
  497.   /// Return whether any name in the profile contains ".__uniq." suffix.
  498.   virtual bool hasUniqSuffix() { return false; }
  499.  
  500.   SampleProfileReaderItaniumRemapper *getRemapper() { return Remapper.get(); }
  501.  
  502.   void setModule(const Module *Mod) { M = Mod; }
  503.  
  504. protected:
  505.   /// Map every function to its associated profile.
  506.   ///
  507.   /// The profile of every function executed at runtime is collected
  508.   /// in the structure FunctionSamples. This maps function objects
  509.   /// to their corresponding profiles.
  510.   SampleProfileMap Profiles;
  511.  
  512.   /// LLVM context used to emit diagnostics.
  513.   LLVMContext &Ctx;
  514.  
  515.   /// Memory buffer holding the profile file.
  516.   std::unique_ptr<MemoryBuffer> Buffer;
  517.  
  518.   /// Extra name buffer holding names created on demand.
  519.   /// This should only be needed for md5 profiles.
  520.   std::unordered_set<std::string> MD5NameBuffer;
  521.  
  522.   /// Profile summary information.
  523.   std::unique_ptr<ProfileSummary> Summary;
  524.  
  525.   /// Take ownership of the summary of this reader.
  526.   static std::unique_ptr<ProfileSummary>
  527.   takeSummary(SampleProfileReader &Reader) {
  528.     return std::move(Reader.Summary);
  529.   }
  530.  
  531.   /// Compute summary for this profile.
  532.   void computeSummary();
  533.  
  534.   std::unique_ptr<SampleProfileReaderItaniumRemapper> Remapper;
  535.  
  536.   /// \brief Whether samples are collected based on pseudo probes.
  537.   bool ProfileIsProbeBased = false;
  538.  
  539.   /// Whether function profiles are context-sensitive flat profiles.
  540.   bool ProfileIsCS = false;
  541.  
  542.   /// Whether function profile contains ShouldBeInlined contexts.
  543.   bool ProfileIsPreInlined = false;
  544.  
  545.   /// Number of context-sensitive profiles.
  546.   uint32_t CSProfileCount = 0;
  547.  
  548.   /// Whether the function profiles use FS discriminators.
  549.   bool ProfileIsFS = false;
  550.  
  551.   /// \brief The format of sample.
  552.   SampleProfileFormat Format = SPF_None;
  553.  
  554.   /// \brief The current module being compiled if SampleProfileReader
  555.   /// is used by compiler. If SampleProfileReader is used by other
  556.   /// tools which are not compiler, M is usually nullptr.
  557.   const Module *M = nullptr;
  558.  
  559.   /// Zero out the discriminator bits higher than bit MaskedBitFrom (0 based).
  560.   /// The default is to keep all the bits.
  561.   uint32_t MaskedBitFrom = 31;
  562. };
  563.  
  564. class SampleProfileReaderText : public SampleProfileReader {
  565. public:
  566.   SampleProfileReaderText(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
  567.       : SampleProfileReader(std::move(B), C, SPF_Text) {}
  568.  
  569.   /// Read and validate the file header.
  570.   std::error_code readHeader() override { return sampleprof_error::success; }
  571.  
  572.   /// Read sample profiles from the associated file.
  573.   std::error_code readImpl() override;
  574.  
  575.   /// Return true if \p Buffer is in the format supported by this class.
  576.   static bool hasFormat(const MemoryBuffer &Buffer);
  577.  
  578. private:
  579.   /// CSNameTable is used to save full context vectors. This serves as an
  580.   /// underlying immutable buffer for all clients.
  581.   std::list<SampleContextFrameVector> CSNameTable;
  582. };
  583.  
  584. class SampleProfileReaderBinary : public SampleProfileReader {
  585. public:
  586.   SampleProfileReaderBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
  587.                             SampleProfileFormat Format = SPF_None)
  588.       : SampleProfileReader(std::move(B), C, Format) {}
  589.  
  590.   /// Read and validate the file header.
  591.   std::error_code readHeader() override;
  592.  
  593.   /// Read sample profiles from the associated file.
  594.   std::error_code readImpl() override;
  595.  
  596.   /// It includes all the names that have samples either in outline instance
  597.   /// or inline instance.
  598.   std::vector<StringRef> *getNameTable() override { return &NameTable; }
  599.  
  600. protected:
  601.   /// Read a numeric value of type T from the profile.
  602.   ///
  603.   /// If an error occurs during decoding, a diagnostic message is emitted and
  604.   /// EC is set.
  605.   ///
  606.   /// \returns the read value.
  607.   template <typename T> ErrorOr<T> readNumber();
  608.  
  609.   /// Read a numeric value of type T from the profile. The value is saved
  610.   /// without encoded.
  611.   template <typename T> ErrorOr<T> readUnencodedNumber();
  612.  
  613.   /// Read a string from the profile.
  614.   ///
  615.   /// If an error occurs during decoding, a diagnostic message is emitted and
  616.   /// EC is set.
  617.   ///
  618.   /// \returns the read value.
  619.   ErrorOr<StringRef> readString();
  620.  
  621.   /// Read the string index and check whether it overflows the table.
  622.   template <typename T> inline ErrorOr<uint32_t> readStringIndex(T &Table);
  623.  
  624.   /// Return true if we've reached the end of file.
  625.   bool at_eof() const { return Data >= End; }
  626.  
  627.   /// Read the next function profile instance.
  628.   std::error_code readFuncProfile(const uint8_t *Start);
  629.  
  630.   /// Read the contents of the given profile instance.
  631.   std::error_code readProfile(FunctionSamples &FProfile);
  632.  
  633.   /// Read the contents of Magic number and Version number.
  634.   std::error_code readMagicIdent();
  635.  
  636.   /// Read profile summary.
  637.   std::error_code readSummary();
  638.  
  639.   /// Read the whole name table.
  640.   virtual std::error_code readNameTable();
  641.  
  642.   /// Points to the current location in the buffer.
  643.   const uint8_t *Data = nullptr;
  644.  
  645.   /// Points to the end of the buffer.
  646.   const uint8_t *End = nullptr;
  647.  
  648.   /// Function name table.
  649.   std::vector<StringRef> NameTable;
  650.  
  651.   /// Read a string indirectly via the name table.
  652.   virtual ErrorOr<StringRef> readStringFromTable();
  653.   virtual ErrorOr<SampleContext> readSampleContextFromTable();
  654.  
  655. private:
  656.   std::error_code readSummaryEntry(std::vector<ProfileSummaryEntry> &Entries);
  657.   virtual std::error_code verifySPMagic(uint64_t Magic) = 0;
  658. };
  659.  
  660. class SampleProfileReaderRawBinary : public SampleProfileReaderBinary {
  661. private:
  662.   std::error_code verifySPMagic(uint64_t Magic) override;
  663.  
  664. public:
  665.   SampleProfileReaderRawBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
  666.                                SampleProfileFormat Format = SPF_Binary)
  667.       : SampleProfileReaderBinary(std::move(B), C, Format) {}
  668.  
  669.   /// \brief Return true if \p Buffer is in the format supported by this class.
  670.   static bool hasFormat(const MemoryBuffer &Buffer);
  671. };
  672.  
  673. /// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase defines
  674. /// the basic structure of the extensible binary format.
  675. /// The format is organized in sections except the magic and version number
  676. /// at the beginning. There is a section table before all the sections, and
  677. /// each entry in the table describes the entry type, start, size and
  678. /// attributes. The format in each section is defined by the section itself.
  679. ///
  680. /// It is easy to add a new section while maintaining the backward
  681. /// compatibility of the profile. Nothing extra needs to be done. If we want
  682. /// to extend an existing section, like add cache misses information in
  683. /// addition to the sample count in the profile body, we can add a new section
  684. /// with the extension and retire the existing section, and we could choose
  685. /// to keep the parser of the old section if we want the reader to be able
  686. /// to read both new and old format profile.
  687. ///
  688. /// SampleProfileReaderExtBinary/SampleProfileWriterExtBinary define the
  689. /// commonly used sections of a profile in extensible binary format. It is
  690. /// possible to define other types of profile inherited from
  691. /// SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase.
  692. class SampleProfileReaderExtBinaryBase : public SampleProfileReaderBinary {
  693. private:
  694.   std::error_code decompressSection(const uint8_t *SecStart,
  695.                                     const uint64_t SecSize,
  696.                                     const uint8_t *&DecompressBuf,
  697.                                     uint64_t &DecompressBufSize);
  698.  
  699.   BumpPtrAllocator Allocator;
  700.  
  701. protected:
  702.   std::vector<SecHdrTableEntry> SecHdrTable;
  703.   std::error_code readSecHdrTableEntry(uint32_t Idx);
  704.   std::error_code readSecHdrTable();
  705.  
  706.   std::error_code readFuncMetadata(bool ProfileHasAttribute);
  707.   std::error_code readFuncMetadata(bool ProfileHasAttribute,
  708.                                    FunctionSamples *FProfile);
  709.   std::error_code readFuncOffsetTable();
  710.   std::error_code readFuncProfiles();
  711.   std::error_code readMD5NameTable();
  712.   std::error_code readNameTableSec(bool IsMD5);
  713.   std::error_code readCSNameTableSec();
  714.   std::error_code readProfileSymbolList();
  715.  
  716.   std::error_code readHeader() override;
  717.   std::error_code verifySPMagic(uint64_t Magic) override = 0;
  718.   virtual std::error_code readOneSection(const uint8_t *Start, uint64_t Size,
  719.                                          const SecHdrTableEntry &Entry);
  720.   // placeholder for subclasses to dispatch their own section readers.
  721.   virtual std::error_code readCustomSection(const SecHdrTableEntry &Entry) = 0;
  722.   ErrorOr<StringRef> readStringFromTable() override;
  723.   ErrorOr<SampleContext> readSampleContextFromTable() override;
  724.   ErrorOr<SampleContextFrames> readContextFromTable();
  725.  
  726.   std::unique_ptr<ProfileSymbolList> ProfSymList;
  727.  
  728.   /// The table mapping from function context to the offset of its
  729.   /// FunctionSample towards file start.
  730.   DenseMap<SampleContext, uint64_t> FuncOffsetTable;
  731.  
  732.   /// Function offset mapping ordered by contexts.
  733.   std::unique_ptr<std::vector<std::pair<SampleContext, uint64_t>>>
  734.       OrderedFuncOffsets;
  735.  
  736.   /// The set containing the functions to use when compiling a module.
  737.   DenseSet<StringRef> FuncsToUse;
  738.  
  739.   /// Use fixed length MD5 instead of ULEB128 encoding so NameTable doesn't
  740.   /// need to be read in up front and can be directly accessed using index.
  741.   bool FixedLengthMD5 = false;
  742.   /// The starting address of NameTable containing fixed length MD5.
  743.   const uint8_t *MD5NameMemStart = nullptr;
  744.  
  745.   /// If MD5 is used in NameTable section, the section saves uint64_t data.
  746.   /// The uint64_t data has to be converted to a string and then the string
  747.   /// will be used to initialize StringRef in NameTable.
  748.   /// Note NameTable contains StringRef so it needs another buffer to own
  749.   /// the string data. MD5StringBuf serves as the string buffer that is
  750.   /// referenced by NameTable (vector of StringRef). We make sure
  751.   /// the lifetime of MD5StringBuf is not shorter than that of NameTable.
  752.   std::unique_ptr<std::vector<std::string>> MD5StringBuf;
  753.  
  754.   /// CSNameTable is used to save full context vectors. This serves as an
  755.   /// underlying immutable buffer for all clients.
  756.   std::unique_ptr<const std::vector<SampleContextFrameVector>> CSNameTable;
  757.  
  758.   /// If SkipFlatProf is true, skip the sections with
  759.   /// SecFlagFlat flag.
  760.   bool SkipFlatProf = false;
  761.  
  762.   bool FuncOffsetsOrdered = false;
  763.  
  764. public:
  765.   SampleProfileReaderExtBinaryBase(std::unique_ptr<MemoryBuffer> B,
  766.                                    LLVMContext &C, SampleProfileFormat Format)
  767.       : SampleProfileReaderBinary(std::move(B), C, Format) {}
  768.  
  769.   /// Read sample profiles in extensible format from the associated file.
  770.   std::error_code readImpl() override;
  771.  
  772.   /// Get the total size of all \p Type sections.
  773.   uint64_t getSectionSize(SecType Type);
  774.   /// Get the total size of header and all sections.
  775.   uint64_t getFileSize();
  776.   bool dumpSectionInfo(raw_ostream &OS = dbgs()) override;
  777.  
  778.   /// Collect functions with definitions in Module M. Return true if
  779.   /// the reader has been given a module.
  780.   bool collectFuncsFromModule() override;
  781.  
  782.   /// Return whether names in the profile are all MD5 numbers.
  783.   bool useMD5() override { return MD5StringBuf.get(); }
  784.  
  785.   std::unique_ptr<ProfileSymbolList> getProfileSymbolList() override {
  786.     return std::move(ProfSymList);
  787.   };
  788.  
  789.   void setSkipFlatProf(bool Skip) override { SkipFlatProf = Skip; }
  790. };
  791.  
  792. class SampleProfileReaderExtBinary : public SampleProfileReaderExtBinaryBase {
  793. private:
  794.   std::error_code verifySPMagic(uint64_t Magic) override;
  795.   std::error_code readCustomSection(const SecHdrTableEntry &Entry) override {
  796.     // Update the data reader pointer to the end of the section.
  797.     Data = End;
  798.     return sampleprof_error::success;
  799.   };
  800.  
  801. public:
  802.   SampleProfileReaderExtBinary(std::unique_ptr<MemoryBuffer> B, LLVMContext &C,
  803.                                SampleProfileFormat Format = SPF_Ext_Binary)
  804.       : SampleProfileReaderExtBinaryBase(std::move(B), C, Format) {}
  805.  
  806.   /// \brief Return true if \p Buffer is in the format supported by this class.
  807.   static bool hasFormat(const MemoryBuffer &Buffer);
  808. };
  809.  
  810. class SampleProfileReaderCompactBinary : public SampleProfileReaderBinary {
  811. private:
  812.   /// Function name table.
  813.   std::vector<std::string> NameTable;
  814.   /// The table mapping from function name to the offset of its FunctionSample
  815.   /// towards file start.
  816.   DenseMap<StringRef, uint64_t> FuncOffsetTable;
  817.   /// The set containing the functions to use when compiling a module.
  818.   DenseSet<StringRef> FuncsToUse;
  819.   std::error_code verifySPMagic(uint64_t Magic) override;
  820.   std::error_code readNameTable() override;
  821.   /// Read a string indirectly via the name table.
  822.   ErrorOr<StringRef> readStringFromTable() override;
  823.   std::error_code readHeader() override;
  824.   std::error_code readFuncOffsetTable();
  825.  
  826. public:
  827.   SampleProfileReaderCompactBinary(std::unique_ptr<MemoryBuffer> B,
  828.                                    LLVMContext &C)
  829.       : SampleProfileReaderBinary(std::move(B), C, SPF_Compact_Binary) {}
  830.  
  831.   /// \brief Return true if \p Buffer is in the format supported by this class.
  832.   static bool hasFormat(const MemoryBuffer &Buffer);
  833.  
  834.   /// Read samples only for functions to use.
  835.   std::error_code readImpl() override;
  836.  
  837.   /// Collect functions with definitions in Module M. Return true if
  838.   /// the reader has been given a module.
  839.   bool collectFuncsFromModule() override;
  840.  
  841.   /// Return whether names in the profile are all MD5 numbers.
  842.   bool useMD5() override { return true; }
  843. };
  844.  
  845. using InlineCallStack = SmallVector<FunctionSamples *, 10>;
  846.  
  847. // Supported histogram types in GCC.  Currently, we only need support for
  848. // call target histograms.
  849. enum HistType {
  850.   HIST_TYPE_INTERVAL,
  851.   HIST_TYPE_POW2,
  852.   HIST_TYPE_SINGLE_VALUE,
  853.   HIST_TYPE_CONST_DELTA,
  854.   HIST_TYPE_INDIR_CALL,
  855.   HIST_TYPE_AVERAGE,
  856.   HIST_TYPE_IOR,
  857.   HIST_TYPE_INDIR_CALL_TOPN
  858. };
  859.  
  860. class SampleProfileReaderGCC : public SampleProfileReader {
  861. public:
  862.   SampleProfileReaderGCC(std::unique_ptr<MemoryBuffer> B, LLVMContext &C)
  863.       : SampleProfileReader(std::move(B), C, SPF_GCC),
  864.         GcovBuffer(Buffer.get()) {}
  865.  
  866.   /// Read and validate the file header.
  867.   std::error_code readHeader() override;
  868.  
  869.   /// Read sample profiles from the associated file.
  870.   std::error_code readImpl() override;
  871.  
  872.   /// Return true if \p Buffer is in the format supported by this class.
  873.   static bool hasFormat(const MemoryBuffer &Buffer);
  874.  
  875. protected:
  876.   std::error_code readNameTable();
  877.   std::error_code readOneFunctionProfile(const InlineCallStack &InlineStack,
  878.                                          bool Update, uint32_t Offset);
  879.   std::error_code readFunctionProfiles();
  880.   std::error_code skipNextWord();
  881.   template <typename T> ErrorOr<T> readNumber();
  882.   ErrorOr<StringRef> readString();
  883.  
  884.   /// Read the section tag and check that it's the same as \p Expected.
  885.   std::error_code readSectionTag(uint32_t Expected);
  886.  
  887.   /// GCOV buffer containing the profile.
  888.   GCOVBuffer GcovBuffer;
  889.  
  890.   /// Function names in this profile.
  891.   std::vector<std::string> Names;
  892.  
  893.   /// GCOV tags used to separate sections in the profile file.
  894.   static const uint32_t GCOVTagAFDOFileNames = 0xaa000000;
  895.   static const uint32_t GCOVTagAFDOFunction = 0xac000000;
  896. };
  897.  
  898. } // end namespace sampleprof
  899.  
  900. } // end namespace llvm
  901.  
  902. #endif // LLVM_PROFILEDATA_SAMPLEPROFREADER_H
  903.