source: part3intro/toolchaintechnotes.xml@ e18ba69

11.3 11.3-rc1 12.0 12.0-rc1 12.1 12.1-rc1 bdubbs/gcc13 multilib renodr/libudev-from-systemd trunk xry111/arm64 xry111/arm64-12.0 xry111/clfs-ng xry111/loongarch xry111/loongarch-12.0 xry111/loongarch-12.1 xry111/mips64el xry111/pip3 xry111/rust-wip-20221008 xry111/update-glibc
Last change on this file since e18ba69 was e18ba69, checked in by Xi Ruoyao <xry111@…>, 19 months ago

toolchain technical note: rewrite the descrption for triplet

  • Don't say "most building system", refine the dicussion for autoconf. Other building systems may use a variant of triplet, or use a completely different system designation.
  • Explain why a triplet may contain 4 fields in detail. "Histroical reason" is not really correct because 3-field triplet is still used today for BSD, Fuchsia, IOS, Mac OS X (darwin), Solaris, etc.
  • "machine" triplet to "system" triplet (strictly speaking, only the first field in the triplet is for the machine).

Why we need to say "vendor can be omitted" explicitly: we mention "gcc
-dumpmachine". On some distros (like Ubuntu) the output has no vendor
field. If you think this is too nasty, please remove both.

  • Property mode set to 100644
File size: 18.6 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="ch-tools-toolchaintechnotes" xreflabel="Toolchain Technical Notes">
9 <?dbhtml filename="toolchaintechnotes.html"?>
10
11 <title>Toolchain Technical Notes</title>
12
13 <para>This section explains some of the rationale and technical details
14 behind the overall build method. Don't try to immediately
15 understand everything in this section. Most of this information will be
16 clearer after performing an actual build. Come back and re-read this chapter
17 at any time during the build process.</para>
18
19 <para>The overall goal of <xref linkend="chapter-cross-tools"/> and <xref
20 linkend="chapter-temporary-tools"/> is to produce a temporary area
21 containing a set of tools that are known to be good, and that are isolated from the host system.
22 By using the <command>chroot</command> command, the compilations in the remaining chapters
23 will be isolated within that environment, ensuring a clean, trouble-free
24 build of the target LFS system. The build process has been designed to
25 minimize the risks for new readers, and to provide the most educational value
26 at the same time.</para>
27
28 <para>This build process is based on
29 <emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
30 to build a compiler and its associated toolchain for a machine different from
31 the one that is used for the build. This is not strictly necessary for LFS,
32 since the machine where the new system will run is the same as the one
33 used for the build. But cross-compilation has one great advantage:
34 anything that is cross-compiled cannot depend on the host environment.</para>
35
36 <sect2 id="cross-compile" xreflabel="About Cross-Compilation">
37
38 <title>About Cross-Compilation</title>
39
40 <note>
41 <para>
42 The LFS book is not (and does not contain) a general tutorial to
43 build a cross (or native) toolchain. Don't use the commands in the
44 book for a cross toolchain for some purpose other
45 than building LFS, unless you really understand what you are doing.
46 </para>
47 </note>
48
49 <para>Cross-compilation involves some concepts that deserve a section of
50 their own. Although this section may be omitted on a first reading,
51 coming back to it later will help you gain a fuller understanding of
52 the process.</para>
53
54 <para>Let us first define some terms used in this context.</para>
55
56 <variablelist>
57 <varlistentry><term>The build</term><listitem>
58 <para>is the machine where we build programs. Note that this machine
59 is also referred to as the <quote>host</quote>.</para></listitem>
60 </varlistentry>
61
62 <varlistentry><term>The host</term><listitem>
63 <para>is the machine/system where the built programs will run. Note
64 that this use of <quote>host</quote> is not the same as in other
65 sections.</para></listitem>
66 </varlistentry>
67
68 <varlistentry><term>The target</term><listitem>
69 <para>is only used for compilers. It is the machine the compiler
70 produces code for. It may be different from both the build and
71 the host.</para></listitem>
72 </varlistentry>
73
74 </variablelist>
75
76 <para>As an example, let us imagine the following scenario (sometimes
77 referred to as <quote>Canadian Cross</quote>): we have a
78 compiler on a slow machine only, let's call it machine A, and the compiler
79 ccA. We also have a fast machine (B), but no compiler for (B), and we
80 want to produce code for a third, slow machine (C). We will build a
81 compiler for machine C in three stages.</para>
82
83 <informaltable align="center">
84 <tgroup cols="5">
85 <colspec colnum="1" align="center"/>
86 <colspec colnum="2" align="center"/>
87 <colspec colnum="3" align="center"/>
88 <colspec colnum="4" align="center"/>
89 <colspec colnum="5" align="left"/>
90 <thead>
91 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
92 <entry>Target</entry><entry>Action</entry></row>
93 </thead>
94 <tbody>
95 <row>
96 <entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
97 <entry>Build cross-compiler cc1 using ccA on machine A.</entry>
98 </row>
99 <row>
100 <entry>2</entry><entry>A</entry><entry>B</entry><entry>C</entry>
101 <entry>Build cross-compiler cc2 using cc1 on machine A.</entry>
102 </row>
103 <row>
104 <entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
105 <entry>Build compiler ccC using cc2 on machine B.</entry>
106 </row>
107 </tbody>
108 </tgroup>
109 </informaltable>
110
111 <para>Then, all the programs needed by machine C can be compiled
112 using cc2 on the fast machine B. Note that unless B can run programs
113 produced for C, there is no way to test the newly built programs until machine
114 C itself is running. For example, to run a test suite on ccC, we may want to add a
115 fourth stage:</para>
116
117 <informaltable align="center">
118 <tgroup cols="5">
119 <colspec colnum="1" align="center"/>
120 <colspec colnum="2" align="center"/>
121 <colspec colnum="3" align="center"/>
122 <colspec colnum="4" align="center"/>
123 <colspec colnum="5" align="left"/>
124 <thead>
125 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
126 <entry>Target</entry><entry>Action</entry></row>
127 </thead>
128 <tbody>
129 <row>
130 <entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
131 <entry>Rebuild and test ccC using ccC on machine C.</entry>
132 </row>
133 </tbody>
134 </tgroup>
135 </informaltable>
136
137 <para>In the example above, only cc1 and cc2 are cross-compilers, that is,
138 they produce code for a machine different from the one they are run on.
139 The other compilers ccA and ccC produce code for the machine they are run
140 on. Such compilers are called <emphasis>native</emphasis> compilers.</para>
141
142 </sect2>
143
144 <sect2 id="lfs-cross">
145 <title>Implementation of Cross-Compilation for LFS</title>
146
147 <note>
148 <para>All packages involved with cross compilation in the book use an
149 autoconf-based building system. The autoconf-based building system
150 accepts system types in the form cpu-vendor-kernel-os,
151 referred to as the system triplet. Since the vendor field is mostly
152 irrelevant, autoconf allows to omit it. An astute reader may wonder
153 why a <quote>triplet</quote> refers to a four component name. The
154 reason is the kernel field and the os field originiated from one
155 <quote>system</quote> field. Such a three-field form is still valid
156 today for some systems, for example
157 <literal>x86_64-unknown-freebsd</literal>. But for other systems,
158 two systems can share the same kernel but still be too different to
159 use a same triplet for them. For example, an Android running on a
160 mobile phone is completely different from Ubuntu running on an ARM64
161 server. Without an emulation layer, you cannot run an executable for
162 the server on the mobile phone or vice versa. So the
163 <quote>system</quote> field is separated into kernel and os fields to
164 designate these systems unambiguously. For our example, the Android
165 system is designated <literal>aarch64-unknown-linux-android</literal>,
166 and the Ubuntu system is designated
167 <literal>aarch64-unknown-linux-gnu</literal>. The word
168 <quote>triplet</quote> remained. A simple way to determine your
169 system triplet is to run the <command>config.guess</command>
170 script that comes with the source for many packages. Unpack the binutils
171 sources and run the script: <userinput>./config.guess</userinput> and note
172 the output. For example, for a 32-bit Intel processor the
173 output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
174 system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>. On most
175 Linux systems the even simpler <command>gcc -dumpmachine</command> command
176 will give you similar information.</para>
177
178 <para>You should also be aware of the name of the platform's dynamic linker, often
179 referred to as the dynamic loader (not to be confused with the standard
180 linker <command>ld</command> that is part of binutils). The dynamic linker
181 provided by package glibc finds and loads the shared libraries needed by a
182 program, prepares the program to run, and then runs it. The name of the
183 dynamic linker for a 32-bit Intel machine is <filename
184 class="libraryfile">ld-linux.so.2</filename>; it's <filename
185 class="libraryfile">ld-linux-x86-64.so.2</filename> on 64-bit systems. A
186 sure-fire way to determine the name of the dynamic linker is to inspect a
187 random binary from the host system by running: <userinput>readelf -l
188 &lt;name of binary&gt; | grep interpreter</userinput> and noting the
189 output. The authoritative reference covering all platforms is in the
190 <filename>shlib-versions</filename> file in the root of the glibc source
191 tree.</para>
192 </note>
193
194 <para>In order to fake a cross compilation in LFS, the name of the host triplet
195 is slightly adjusted by changing the &quot;vendor&quot; field in the
196 <envar>LFS_TGT</envar> variable so it says &quot;lfs&quot;. We also use the
197 <parameter>--with-sysroot</parameter> option when building the cross linker and
198 cross compiler to tell them where to find the needed host files. This
199 ensures that none of the other programs built in <xref
200 linkend="chapter-temporary-tools"/> can link to libraries on the build
201 machine. Only two stages are mandatory, plus one more for tests.</para>
202
203 <informaltable align="center">
204 <tgroup cols="5">
205 <colspec colnum="1" align="center"/>
206 <colspec colnum="2" align="center"/>
207 <colspec colnum="3" align="center"/>
208 <colspec colnum="4" align="center"/>
209 <colspec colnum="5" align="left"/>
210 <thead>
211 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
212 <entry>Target</entry><entry>Action</entry></row>
213 </thead>
214 <tbody>
215 <row>
216 <entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
217 <entry>Build cross-compiler cc1 using cc-pc on pc.</entry>
218 </row>
219 <row>
220 <entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
221 <entry>Build compiler cc-lfs using cc1 on pc.</entry>
222 </row>
223 <row>
224 <entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
225 <entry>Rebuild and test cc-lfs using cc-lfs on lfs.</entry>
226 </row>
227 </tbody>
228 </tgroup>
229 </informaltable>
230
231 <para>In the preceding table, <quote>on pc</quote> means the commands are run
232 on a machine using the already installed distribution. <quote>On
233 lfs</quote> means the commands are run in a chrooted environment.</para>
234
235 <para>Now, there is more about cross-compiling: the C language is not
236 just a compiler, but also defines a standard library. In this book, the
237 GNU C library, named glibc, is used (there is an alternative, &quot;musl&quot;). This library must
238 be compiled for the LFS machine; that is, using the cross compiler cc1.
239 But the compiler itself uses an internal library implementing complex
240 subroutines for functions not available in the assembler instruction set. This
241 internal library is named libgcc, and it must be linked to the glibc
242 library to be fully functional! Furthermore, the standard library for
243 C++ (libstdc++) must also be linked with glibc. The solution to this
244 chicken and egg problem is first to build a degraded cc1-based libgcc,
245 lacking some functionalities such as threads and exception handling, and then
246 to build glibc using this degraded compiler (glibc itself is not
247 degraded), and also to build libstdc++. This last library will lack some of the
248 functionality of libgcc.</para>
249
250 <para>This is not the end of the story: the upshot of the preceding
251 paragraph is that cc1 is unable to build a fully functional libstdc++, but
252 this is the only compiler available for building the C/C++ libraries
253 during stage 2! Of course, the compiler built during stage 2, cc-lfs,
254 would be able to build those libraries, but (1) the build system of
255 gcc does not know that it is usable on pc, and (2) using it on pc
256 would create a risk of linking to the pc libraries, since cc-lfs is a native
257 compiler. So we have to re-build libstdc++ twice later on: as a part of
258 gcc stage 2, and then again in the chroot environment (gcc stage 3).</para>
259
260 </sect2>
261
262 <sect2 id="other-details">
263
264 <title>Other procedural details</title>
265
266 <para>The cross-compiler will be installed in a separate <filename
267 class="directory">$LFS/tools</filename> directory, since it will not
268 be part of the final system.</para>
269
270 <para>Binutils is installed first because the <command>configure</command>
271 runs of both gcc and glibc perform various feature tests on the assembler
272 and linker to determine which software features to enable or disable. This
273 is more important than one might realize at first. An incorrectly configured
274 gcc or glibc can result in a subtly broken toolchain, where the impact of
275 such breakage might not show up until near the end of the build of an
276 entire distribution. A test suite failure will usually highlight this error
277 before too much additional work is performed.</para>
278
279 <para>Binutils installs its assembler and linker in two locations,
280 <filename class="directory">$LFS/tools/bin</filename> and <filename
281 class="directory">$LFS/tools/$LFS_TGT/bin</filename>. The tools in one
282 location are hard linked to the other. An important facet of the linker is
283 its library search order. Detailed information can be obtained from
284 <command>ld</command> by passing it the <parameter>--verbose</parameter>
285 flag. For example, <command>$LFS_TGT-ld --verbose | grep SEARCH</command>
286 will illustrate the current search paths and their order. It shows which
287 files are linked by <command>ld</command> by compiling a dummy program and
288 passing the <parameter>--verbose</parameter> switch to the linker. For
289 example,
290 <command>$LFS_TGT-gcc dummy.c -Wl,--verbose 2&gt;&amp;1 | grep succeeded</command>
291 will show all the files successfully opened during the linking.</para>
292
293 <para>The next package installed is gcc. An example of what can be
294 seen during its run of <command>configure</command> is:</para>
295
296<screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
297checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
298
299 <para>This is important for the reasons mentioned above. It also
300 demonstrates that gcc's configure script does not search the PATH
301 directories to find which tools to use. However, during the actual
302 operation of <command>gcc</command> itself, the same search paths are not
303 necessarily used. To find out which standard linker <command>gcc</command>
304 will use, run: <command>$LFS_TGT-gcc -print-prog-name=ld</command>.</para>
305
306 <para>Detailed information can be obtained from <command>gcc</command> by
307 passing it the <parameter>-v</parameter> command line option while compiling
308 a dummy program. For example, <command>gcc -v dummy.c</command> will show
309 detailed information about the preprocessor, compilation, and assembly
310 stages, including <command>gcc</command>'s included search paths and their
311 order.</para>
312
313 <para>Next installed are sanitized Linux API headers. These allow the
314 standard C library (glibc) to interface with features that the Linux
315 kernel will provide.</para>
316
317 <para>The next package installed is glibc. The most important
318 considerations for building glibc are the compiler, binary tools, and
319 kernel headers. The compiler is generally not an issue since glibc will
320 always use the compiler relating to the <parameter>--host</parameter>
321 parameter passed to its configure script; e.g. in our case, the compiler
322 will be <command>$LFS_TGT-gcc</command>. The binary tools and kernel
323 headers can be a bit more complicated. Therefore, we take no risks and use
324 the available configure switches to enforce the correct selections. After
325 the run of <command>configure</command>, check the contents of the
326 <filename>config.make</filename> file in the <filename
327 class="directory">build</filename> directory for all important details.
328 Note the use of <parameter>CC="$LFS_TGT-gcc"</parameter> (with
329 <envar>$LFS_TGT</envar> expanded) to control which binary tools are used
330 and the use of the <parameter>-nostdinc</parameter> and
331 <parameter>-isystem</parameter> flags to control the compiler's include
332 search path. These items highlight an important aspect of the glibc
333 package&mdash;it is very self-sufficient in terms of its build machinery
334 and generally does not rely on toolchain defaults.</para>
335
336 <para>As mentioned above, the standard C++ library is compiled next, followed in
337 <xref linkend="chapter-temporary-tools"/> by other programs that need
338 to be cross compiled for breaking circular dependencies at build time.
339 The install step of all those packages uses the
340 <envar>DESTDIR</envar> variable to force installation
341 in the LFS filesystem.</para>
342
343 <para>At the end of <xref linkend="chapter-temporary-tools"/> the native
344 LFS compiler is installed. First binutils-pass2 is built,
345 in the same <envar>DESTDIR</envar> directory as the other programs,
346 then the second pass of gcc is constructed, omitting some
347 non-critical libraries. Due to some weird logic in gcc's
348 configure script, <envar>CC_FOR_TARGET</envar> ends up as
349 <command>cc</command> when the host is the same as the target, but
350 different from the build system. This is why
351 <parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is declared explicitly
352 as one of the configuration options.</para>
353
354 <para>Upon entering the chroot environment in <xref
355 linkend="chapter-chroot-temporary-tools"/>,
356 the temporary installations of programs needed for the proper
357 operation of the toolchain are performed. From this point onwards, the
358 core toolchain is self-contained and self-hosted. In
359 <xref linkend="chapter-building-system"/>, final versions of all the
360 packages needed for a fully functional system are built, tested and
361 installed.</para>
362
363 </sect2>
364
365</sect1>
Note: See TracBrowser for help on using the repository browser.