source: part3intro/toolchaintechnotes.xml@ 543c94c

11.3 11.3-rc1 multilib trunk xry111/arm64 xry111/clfs-ng xry111/glibc-2.37 xry111/kcfg-revise xry111/pip3 xry111/rust-wip-20221008
Last change on this file since 543c94c was 543c94c, checked in by Xi Ruoyao <xry111@…>, 6 months ago

libstdc++ "stage 3" is not rebuilt for the same reason as "stage 2"

I'm pretty sure "stage 2" libstdc++ (installed in ch6) is already fully
featured. The reason to rebuild the stage 3 libstdc++ (or entire
stage 3 gcc) is same as the reason to rebuild every packages in multiple
chapters: to "settle down" it.

Merge the content of into the book
as an explanation.

  • Property mode set to 100644
File size: 19.6 KB
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "" [
4 <!ENTITY % general-entities SYSTEM "../general.ent">
5 %general-entities;
8<sect1 id="ch-tools-toolchaintechnotes" xreflabel="Toolchain Technical Notes">
9 <?dbhtml filename="toolchaintechnotes.html"?>
11 <title>Toolchain Technical Notes</title>
13 <para>This section explains some of the rationale and technical details
14 behind the overall build method. Don't try to immediately
15 understand everything in this section. Most of this information will be
16 clearer after performing an actual build. Come back and re-read this chapter
17 at any time during the build process.</para>
19 <para>The overall goal of <xref linkend="chapter-cross-tools"/> and <xref
20 linkend="chapter-temporary-tools"/> is to produce a temporary area
21 containing a set of tools that are known to be good, and that are isolated from the host system.
22 By using the <command>chroot</command> command, the compilations in the remaining chapters
23 will be isolated within that environment, ensuring a clean, trouble-free
24 build of the target LFS system. The build process has been designed to
25 minimize the risks for new readers, and to provide the most educational value
26 at the same time.</para>
28 <para>This build process is based on
29 <emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
30 to build a compiler and its associated toolchain for a machine different from
31 the one that is used for the build. This is not strictly necessary for LFS,
32 since the machine where the new system will run is the same as the one
33 used for the build. But cross-compilation has one great advantage:
34 anything that is cross-compiled cannot depend on the host environment.</para>
36 <sect2 id="cross-compile" xreflabel="About Cross-Compilation">
38 <title>About Cross-Compilation</title>
40 <note>
41 <para>
42 The LFS book is not (and does not contain) a general tutorial to
43 build a cross (or native) toolchain. Don't use the commands in the
44 book for a cross toolchain for some purpose other
45 than building LFS, unless you really understand what you are doing.
46 </para>
47 </note>
49 <para>Cross-compilation involves some concepts that deserve a section of
50 their own. Although this section may be omitted on a first reading,
51 coming back to it later will help you gain a fuller understanding of
52 the process.</para>
54 <para>Let us first define some terms used in this context.</para>
56 <variablelist>
57 <varlistentry><term>The build</term><listitem>
58 <para>is the machine where we build programs. Note that this machine
59 is also referred to as the <quote>host</quote>.</para></listitem>
60 </varlistentry>
62 <varlistentry><term>The host</term><listitem>
63 <para>is the machine/system where the built programs will run. Note
64 that this use of <quote>host</quote> is not the same as in other
65 sections.</para></listitem>
66 </varlistentry>
68 <varlistentry><term>The target</term><listitem>
69 <para>is only used for compilers. It is the machine the compiler
70 produces code for. It may be different from both the build and
71 the host.</para></listitem>
72 </varlistentry>
74 </variablelist>
76 <para>As an example, let us imagine the following scenario (sometimes
77 referred to as <quote>Canadian Cross</quote>): we have a
78 compiler on a slow machine only, let's call it machine A, and the compiler
79 ccA. We also have a fast machine (B), but no compiler for (B), and we
80 want to produce code for a third, slow machine (C). We will build a
81 compiler for machine C in three stages.</para>
83 <informaltable align="center">
84 <tgroup cols="5">
85 <colspec colnum="1" align="center"/>
86 <colspec colnum="2" align="center"/>
87 <colspec colnum="3" align="center"/>
88 <colspec colnum="4" align="center"/>
89 <colspec colnum="5" align="left"/>
90 <thead>
91 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
92 <entry>Target</entry><entry>Action</entry></row>
93 </thead>
94 <tbody>
95 <row>
96 <entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
97 <entry>Build cross-compiler cc1 using ccA on machine A.</entry>
98 </row>
99 <row>
100 <entry>2</entry><entry>A</entry><entry>B</entry><entry>C</entry>
101 <entry>Build cross-compiler cc2 using cc1 on machine A.</entry>
102 </row>
103 <row>
104 <entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
105 <entry>Build compiler ccC using cc2 on machine B.</entry>
106 </row>
107 </tbody>
108 </tgroup>
109 </informaltable>
111 <para>Then, all the programs needed by machine C can be compiled
112 using cc2 on the fast machine B. Note that unless B can run programs
113 produced for C, there is no way to test the newly built programs until machine
114 C itself is running. For example, to run a test suite on ccC, we may want to add a
115 fourth stage:</para>
117 <informaltable align="center">
118 <tgroup cols="5">
119 <colspec colnum="1" align="center"/>
120 <colspec colnum="2" align="center"/>
121 <colspec colnum="3" align="center"/>
122 <colspec colnum="4" align="center"/>
123 <colspec colnum="5" align="left"/>
124 <thead>
125 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
126 <entry>Target</entry><entry>Action</entry></row>
127 </thead>
128 <tbody>
129 <row>
130 <entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
131 <entry>Rebuild and test ccC using ccC on machine C.</entry>
132 </row>
133 </tbody>
134 </tgroup>
135 </informaltable>
137 <para>In the example above, only cc1 and cc2 are cross-compilers, that is,
138 they produce code for a machine different from the one they are run on.
139 The other compilers ccA and ccC produce code for the machine they are run
140 on. Such compilers are called <emphasis>native</emphasis> compilers.</para>
142 </sect2>
144 <sect2 id="lfs-cross">
145 <title>Implementation of Cross-Compilation for LFS</title>
147 <note>
148 <para>All packages involved with cross compilation in the book use an
149 autoconf-based building system. The autoconf-based building system
150 accepts system types in the form cpu-vendor-kernel-os,
151 referred to as the system triplet. Since the vendor field is mostly
152 irrelevant, autoconf allows to omit it. An astute reader may wonder
153 why a <quote>triplet</quote> refers to a four component name. The
154 reason is the kernel field and the os field originiated from one
155 <quote>system</quote> field. Such a three-field form is still valid
156 today for some systems, for example
157 <literal>x86_64-unknown-freebsd</literal>. But for other systems,
158 two systems can share the same kernel but still be too different to
159 use a same triplet for them. For example, an Android running on a
160 mobile phone is completely different from Ubuntu running on an ARM64
161 server, despite they are running on the same type of CPU (ARM64) and
162 using the same kernel (Linux).
163 Without an emulation layer, you cannot run an
164 executable for the server on the mobile phone or vice versa. So the
165 <quote>system</quote> field is separated into kernel and os fields to
166 designate these systems unambiguously. For our example, the Android
167 system is designated <literal>aarch64-unknown-linux-android</literal>,
168 and the Ubuntu system is designated
169 <literal>aarch64-unknown-linux-gnu</literal>. The word
170 <quote>triplet</quote> remained. A simple way to determine your
171 system triplet is to run the <command>config.guess</command>
172 script that comes with the source for many packages. Unpack the binutils
173 sources and run the script: <userinput>./config.guess</userinput> and note
174 the output. For example, for a 32-bit Intel processor the
175 output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
176 system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>. On most
177 Linux systems the even simpler <command>gcc -dumpmachine</command> command
178 will give you similar information.</para>
180 <para>You should also be aware of the name of the platform's dynamic linker, often
181 referred to as the dynamic loader (not to be confused with the standard
182 linker <command>ld</command> that is part of binutils). The dynamic linker
183 provided by package glibc finds and loads the shared libraries needed by a
184 program, prepares the program to run, and then runs it. The name of the
185 dynamic linker for a 32-bit Intel machine is <filename
186 class="libraryfile"></filename>; it's <filename
187 class="libraryfile"></filename> on 64-bit systems. A
188 sure-fire way to determine the name of the dynamic linker is to inspect a
189 random binary from the host system by running: <userinput>readelf -l
190 &lt;name of binary&gt; | grep interpreter</userinput> and noting the
191 output. The authoritative reference covering all platforms is in the
192 <filename>shlib-versions</filename> file in the root of the glibc source
193 tree.</para>
194 </note>
196 <para>In order to fake a cross compilation in LFS, the name of the host triplet
197 is slightly adjusted by changing the &quot;vendor&quot; field in the
198 <envar>LFS_TGT</envar> variable so it says &quot;lfs&quot;. We also use the
199 <parameter>--with-sysroot</parameter> option when building the cross linker and
200 cross compiler to tell them where to find the needed host files. This
201 ensures that none of the other programs built in <xref
202 linkend="chapter-temporary-tools"/> can link to libraries on the build
203 machine. Only two stages are mandatory, plus one more for tests.</para>
205 <informaltable align="center">
206 <tgroup cols="5">
207 <colspec colnum="1" align="center"/>
208 <colspec colnum="2" align="center"/>
209 <colspec colnum="3" align="center"/>
210 <colspec colnum="4" align="center"/>
211 <colspec colnum="5" align="left"/>
212 <thead>
213 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
214 <entry>Target</entry><entry>Action</entry></row>
215 </thead>
216 <tbody>
217 <row>
218 <entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
219 <entry>Build cross-compiler cc1 using cc-pc on pc.</entry>
220 </row>
221 <row>
222 <entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
223 <entry>Build compiler cc-lfs using cc1 on pc.</entry>
224 </row>
225 <row>
226 <entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
227 <entry>Rebuild and test cc-lfs using cc-lfs on lfs.</entry>
228 </row>
229 </tbody>
230 </tgroup>
231 </informaltable>
233 <para>In the preceding table, <quote>on pc</quote> means the commands are run
234 on a machine using the already installed distribution. <quote>On
235 lfs</quote> means the commands are run in a chrooted environment.</para>
237 <para>Now, there is more about cross-compiling: the C language is not
238 just a compiler, but also defines a standard library. In this book, the
239 GNU C library, named glibc, is used (there is an alternative, &quot;musl&quot;). This library must
240 be compiled for the LFS machine; that is, using the cross compiler cc1.
241 But the compiler itself uses an internal library implementing complex
242 subroutines for functions not available in the assembler instruction set. This
243 internal library is named libgcc, and it must be linked to the glibc
244 library to be fully functional! Furthermore, the standard library for
245 C++ (libstdc++) must also be linked with glibc. The solution to this
246 chicken and egg problem is first to build a degraded cc1-based libgcc,
247 lacking some functionalities such as threads and exception handling, and then
248 to build glibc using this degraded compiler (glibc itself is not
249 degraded), and also to build libstdc++. This last library will lack some of the
250 functionality of libgcc.</para>
252 <para>This is not the end of the story: the upshot of the preceding
253 paragraph is that cc1 is unable to build a fully functional libstdc++, but
254 this is the only compiler available for building the C/C++ libraries
255 during stage 2! Of course, the compiler built during stage 2, cc-lfs,
256 would be able to build those libraries, but (1) the build system of
257 gcc does not know that it is usable on pc, and (2) using it on pc
258 would create a risk of linking to the pc libraries, since cc-lfs is a native
259 compiler. So we have to re-build libstdc++ later as a part of
260 gcc stage 2.</para>
262 <para>In &ch-final; (or <quote>stage 3</quote>), all packages needed for
263 the LFS system are built. Even if a package is already installed into
264 the LFS system in a previous chapter, we still rebuild the package
265 unless we are completely sure it's unnecessary. The main reason for
266 rebuilding these packages is to settle them down: if we reinstall a LFS
267 package on a complete LFS system, the installed content of the package
268 should be same as the content of the same package installed in
269 &ch-final;. The temporary packages installed in &ch-tmp-cross; or
270 &ch-tmp-chroot; cannot satisify this expectation because some of them
271 are built without optional dependencies installed, and autoconf cannot
272 perform some feature checks in &ch-tmp-cross; because of cross
273 compilation, causing the temporary packages to lack optional features
274 or use suboptimal code routines. Additionally, a minor reason for
275 rebuilding the packages is allowing to run the testsuite.</para>
277 </sect2>
279 <sect2 id="other-details">
281 <title>Other procedural details</title>
283 <para>The cross-compiler will be installed in a separate <filename
284 class="directory">$LFS/tools</filename> directory, since it will not
285 be part of the final system.</para>
287 <para>Binutils is installed first because the <command>configure</command>
288 runs of both gcc and glibc perform various feature tests on the assembler
289 and linker to determine which software features to enable or disable. This
290 is more important than one might realize at first. An incorrectly configured
291 gcc or glibc can result in a subtly broken toolchain, where the impact of
292 such breakage might not show up until near the end of the build of an
293 entire distribution. A test suite failure will usually highlight this error
294 before too much additional work is performed.</para>
296 <para>Binutils installs its assembler and linker in two locations,
297 <filename class="directory">$LFS/tools/bin</filename> and <filename
298 class="directory">$LFS/tools/$LFS_TGT/bin</filename>. The tools in one
299 location are hard linked to the other. An important facet of the linker is
300 its library search order. Detailed information can be obtained from
301 <command>ld</command> by passing it the <parameter>--verbose</parameter>
302 flag. For example, <command>$LFS_TGT-ld --verbose | grep SEARCH</command>
303 will illustrate the current search paths and their order. It shows which
304 files are linked by <command>ld</command> by compiling a dummy program and
305 passing the <parameter>--verbose</parameter> switch to the linker. For
306 example,
307 <command>$LFS_TGT-gcc dummy.c -Wl,--verbose 2&gt;&amp;1 | grep succeeded</command>
308 will show all the files successfully opened during the linking.</para>
310 <para>The next package installed is gcc. An example of what can be
311 seen during its run of <command>configure</command> is:</para>
313<screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
314checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
316 <para>This is important for the reasons mentioned above. It also
317 demonstrates that gcc's configure script does not search the PATH
318 directories to find which tools to use. However, during the actual
319 operation of <command>gcc</command> itself, the same search paths are not
320 necessarily used. To find out which standard linker <command>gcc</command>
321 will use, run: <command>$LFS_TGT-gcc -print-prog-name=ld</command>.</para>
323 <para>Detailed information can be obtained from <command>gcc</command> by
324 passing it the <parameter>-v</parameter> command line option while compiling
325 a dummy program. For example, <command>gcc -v dummy.c</command> will show
326 detailed information about the preprocessor, compilation, and assembly
327 stages, including <command>gcc</command>'s included search paths and their
328 order.</para>
330 <para>Next installed are sanitized Linux API headers. These allow the
331 standard C library (glibc) to interface with features that the Linux
332 kernel will provide.</para>
334 <para>The next package installed is glibc. The most important
335 considerations for building glibc are the compiler, binary tools, and
336 kernel headers. The compiler is generally not an issue since glibc will
337 always use the compiler relating to the <parameter>--host</parameter>
338 parameter passed to its configure script; e.g. in our case, the compiler
339 will be <command>$LFS_TGT-gcc</command>. The binary tools and kernel
340 headers can be a bit more complicated. Therefore, we take no risks and use
341 the available configure switches to enforce the correct selections. After
342 the run of <command>configure</command>, check the contents of the
343 <filename>config.make</filename> file in the <filename
344 class="directory">build</filename> directory for all important details.
345 Note the use of <parameter>CC="$LFS_TGT-gcc"</parameter> (with
346 <envar>$LFS_TGT</envar> expanded) to control which binary tools are used
347 and the use of the <parameter>-nostdinc</parameter> and
348 <parameter>-isystem</parameter> flags to control the compiler's include
349 search path. These items highlight an important aspect of the glibc
350 package&mdash;it is very self-sufficient in terms of its build machinery
351 and generally does not rely on toolchain defaults.</para>
353 <para>As mentioned above, the standard C++ library is compiled next, followed in
354 <xref linkend="chapter-temporary-tools"/> by other programs that need
355 to be cross compiled for breaking circular dependencies at build time.
356 The install step of all those packages uses the
357 <envar>DESTDIR</envar> variable to force installation
358 in the LFS filesystem.</para>
360 <para>At the end of <xref linkend="chapter-temporary-tools"/> the native
361 LFS compiler is installed. First binutils-pass2 is built,
362 in the same <envar>DESTDIR</envar> directory as the other programs,
363 then the second pass of gcc is constructed, omitting some
364 non-critical libraries. Due to some weird logic in gcc's
365 configure script, <envar>CC_FOR_TARGET</envar> ends up as
366 <command>cc</command> when the host is the same as the target, but
367 different from the build system. This is why
368 <parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is declared explicitly
369 as one of the configuration options.</para>
371 <para>Upon entering the chroot environment in <xref
372 linkend="chapter-chroot-temporary-tools"/>,
373 the temporary installations of programs needed for the proper
374 operation of the toolchain are performed. From this point onwards, the
375 core toolchain is self-contained and self-hosted. In
376 <xref linkend="chapter-building-system"/>, final versions of all the
377 packages needed for a fully functional system are built, tested and
378 installed.</para>
380 </sect2>
Note: See TracBrowser for help on using the repository browser.