source: part3intro/toolchaintechnotes.xml@ d48d1c2

11.1 11.1-rc1 arm multilib s6-init trunk xry111/clfs-ng xry111/lfs-next
Last change on this file since d48d1c2 was d48d1c2, checked in by Xi Ruoyao <xry111@…>, 9 months ago

toolchain note: add a disclaimer for the purpose of the book

There are some discussion on gcc-help from someone (mis)using LFS to
build a "general" toolchain. Let's stop it before off-topic message got
into lfs-support.

  • Property mode set to 100644
File size: 17.3 KB
Line 
1<?xml version="1.0" encoding="ISO-8859-1"?>
2<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
3 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
4 <!ENTITY % general-entities SYSTEM "../general.ent">
5 %general-entities;
6]>
7
8<sect1 id="ch-tools-toolchaintechnotes" xreflabel="Toolchain Technical Notes">
9 <?dbhtml filename="toolchaintechnotes.html"?>
10
11 <title>Toolchain Technical Notes</title>
12
13 <para>This section explains some of the rationale and technical details
14 behind the overall build method. It is not essential to immediately
15 understand everything in this section. Most of this information will be
16 clearer after performing an actual build. This section can be referred
17 to at any time during the process.</para>
18
19 <para>The overall goal of <xref linkend="chapter-cross-tools"/> and <xref
20 linkend="chapter-temporary-tools"/> is to produce a temporary area that
21 contains a known-good set of tools that can be isolated from the host system.
22 By using <command>chroot</command>, the commands in the remaining chapters
23 will be contained within that environment, ensuring a clean, trouble-free
24 build of the target LFS system. The build process has been designed to
25 minimize the risks for new readers and to provide the most educational value
26 at the same time.</para>
27
28 <para>The build process is based on the process of
29 <emphasis>cross-compilation</emphasis>. Cross-compilation is normally used
30 for building a compiler and its toolchain for a machine different from
31 the one that is used for the build. This is not strictly needed for LFS,
32 since the machine where the new system will run is the same as the one
33 used for the build. But cross-compilation has the great advantage that
34 anything that is cross-compiled cannot depend on the host environment.</para>
35
36 <sect2 id="cross-compile" xreflabel="About Cross-Compilation">
37
38 <title>About Cross-Compilation</title>
39
40 <note>
41 <para>
42 The LFS book is not, and does not contain a general tutorial to
43 build a cross (or native) toolchain. Don't use the command in the
44 book for a cross toolchain which will be used for some purpose other
45 than building LFS, unless you really understand what you are doing.
46 </para>
47 </note>
48
49 <para>Cross-compilation involves some concepts that deserve a section on
50 their own. Although this section may be omitted in a first reading,
51 coming back to it later will be beneficial to your full understanding of
52 the process.</para>
53
54 <para>Let us first define some terms used in this context:</para>
55
56 <variablelist>
57 <varlistentry><term>build</term><listitem>
58 <para>is the machine where we build programs. Note that this machine
59 is referred to as the <quote>host</quote> in other
60 sections.</para></listitem>
61 </varlistentry>
62
63 <varlistentry><term>host</term><listitem>
64 <para>is the machine/system where the built programs will run. Note
65 that this use of <quote>host</quote> is not the same as in other
66 sections.</para></listitem>
67 </varlistentry>
68
69 <varlistentry><term>target</term><listitem>
70 <para>is only used for compilers. It is the machine the compiler
71 produces code for. It may be different from both build and
72 host.</para></listitem>
73 </varlistentry>
74
75 </variablelist>
76
77 <para>As an example, let us imagine the following scenario (sometimes
78 referred to as <quote>Canadian Cross</quote>): we may have a
79 compiler on a slow machine only, let's call it machine A, and the compiler
80 ccA. We may have also a fast machine (B), but with no compiler, and we may
81 want to produce code for another slow machine (C). To build a
82 compiler for machine C, we would have three stages:</para>
83
84 <informaltable align="center">
85 <tgroup cols="5">
86 <colspec colnum="1" align="center"/>
87 <colspec colnum="2" align="center"/>
88 <colspec colnum="3" align="center"/>
89 <colspec colnum="4" align="center"/>
90 <colspec colnum="5" align="left"/>
91 <thead>
92 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
93 <entry>Target</entry><entry>Action</entry></row>
94 </thead>
95 <tbody>
96 <row>
97 <entry>1</entry><entry>A</entry><entry>A</entry><entry>B</entry>
98 <entry>build cross-compiler cc1 using ccA on machine A</entry>
99 </row>
100 <row>
101 <entry>2</entry><entry>A</entry><entry>B</entry><entry>C</entry>
102 <entry>build cross-compiler cc2 using cc1 on machine A</entry>
103 </row>
104 <row>
105 <entry>3</entry><entry>B</entry><entry>C</entry><entry>C</entry>
106 <entry>build compiler ccC using cc2 on machine B</entry>
107 </row>
108 </tbody>
109 </tgroup>
110 </informaltable>
111
112 <para>Then, all the other programs needed by machine C can be compiled
113 using cc2 on the fast machine B. Note that unless B can run programs
114 produced for C, there is no way to test the built programs until machine
115 C itself is running. For example, for testing ccC, we may want to add a
116 fourth stage:</para>
117
118 <informaltable align="center">
119 <tgroup cols="5">
120 <colspec colnum="1" align="center"/>
121 <colspec colnum="2" align="center"/>
122 <colspec colnum="3" align="center"/>
123 <colspec colnum="4" align="center"/>
124 <colspec colnum="5" align="left"/>
125 <thead>
126 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
127 <entry>Target</entry><entry>Action</entry></row>
128 </thead>
129 <tbody>
130 <row>
131 <entry>4</entry><entry>C</entry><entry>C</entry><entry>C</entry>
132 <entry>rebuild and test ccC using itself on machine C</entry>
133 </row>
134 </tbody>
135 </tgroup>
136 </informaltable>
137
138 <para>In the example above, only cc1 and cc2 are cross-compilers, that is,
139 they produce code for a machine different from the one they are run on.
140 The other compilers ccA and ccC produce code for the machine they are run
141 on. Such compilers are called <emphasis>native</emphasis> compilers.</para>
142
143 </sect2>
144
145 <sect2 id="lfs-cross">
146 <title>Implementation of Cross-Compilation for LFS</title>
147
148 <note>
149 <para>Almost all the build systems use names of the form
150 cpu-vendor-kernel-os referred to as the machine triplet. An astute
151 reader may wonder why a <quote>triplet</quote> refers to a four component
152 name. The reason is history: initially, three component names were enough
153 to designate a machine unambiguously, but with new machines and systems
154 appearing, that proved insufficient. The word <quote>triplet</quote>
155 remained. A simple way to determine your machine triplet is to run
156 the <command>config.guess</command>
157 script that comes with the source for many packages. Unpack the binutils
158 sources and run the script: <userinput>./config.guess</userinput> and note
159 the output. For example, for a 32-bit Intel processor the
160 output will be <emphasis>i686-pc-linux-gnu</emphasis>. On a 64-bit
161 system it will be <emphasis>x86_64-pc-linux-gnu</emphasis>.</para>
162
163 <para>Also be aware of the name of the platform's dynamic linker, often
164 referred to as the dynamic loader (not to be confused with the standard
165 linker <command>ld</command> that is part of binutils). The dynamic linker
166 provided by Glibc finds and loads the shared libraries needed by a
167 program, prepares the program to run, and then runs it. The name of the
168 dynamic linker for a 32-bit Intel machine is <filename
169 class="libraryfile">ld-linux.so.2</filename> and is<filename
170 class="libraryfile">ld-linux-x86-64.so.2</filename> for 64-bit systems. A
171 sure-fire way to determine the name of the dynamic linker is to inspect a
172 random binary from the host system by running: <userinput>readelf -l
173 &lt;name of binary&gt; | grep interpreter</userinput> and noting the
174 output. The authoritative reference covering all platforms is in the
175 <filename>shlib-versions</filename> file in the root of the Glibc source
176 tree.</para>
177 </note>
178
179 <para>In order to fake a cross compilation in LFS, the name of the host triplet
180 is slightly adjusted by changing the &quot;vendor&quot; field in the
181 <envar>LFS_TGT</envar> variable. We also use the
182 <parameter>--with-sysroot</parameter> option when building the cross linker and
183 cross compiler to tell them where to find the needed host files. This
184 ensures that none of the other programs built in <xref
185 linkend="chapter-temporary-tools"/> can link to libraries on the build
186 machine. Only two stages are mandatory, and one more for tests:</para>
187
188 <informaltable align="center">
189 <tgroup cols="5">
190 <colspec colnum="1" align="center"/>
191 <colspec colnum="2" align="center"/>
192 <colspec colnum="3" align="center"/>
193 <colspec colnum="4" align="center"/>
194 <colspec colnum="5" align="left"/>
195 <thead>
196 <row><entry>Stage</entry><entry>Build</entry><entry>Host</entry>
197 <entry>Target</entry><entry>Action</entry></row>
198 </thead>
199 <tbody>
200 <row>
201 <entry>1</entry><entry>pc</entry><entry>pc</entry><entry>lfs</entry>
202 <entry>build cross-compiler cc1 using cc-pc on pc</entry>
203 </row>
204 <row>
205 <entry>2</entry><entry>pc</entry><entry>lfs</entry><entry>lfs</entry>
206 <entry>build compiler cc-lfs using cc1 on pc</entry>
207 </row>
208 <row>
209 <entry>3</entry><entry>lfs</entry><entry>lfs</entry><entry>lfs</entry>
210 <entry>rebuild and test cc-lfs using itself on lfs</entry>
211 </row>
212 </tbody>
213 </tgroup>
214 </informaltable>
215
216 <para>In the above table, <quote>on pc</quote> means the commands are run
217 on a machine using the already installed distribution. <quote>On
218 lfs</quote> means the commands are run in a chrooted environment.</para>
219
220 <para>Now, there is more about cross-compiling: the C language is not
221 just a compiler, but also defines a standard library. In this book, the
222 GNU C library, named glibc, is used. This library must
223 be compiled for the lfs machine, that is, using the cross compiler cc1.
224 But the compiler itself uses an internal library implementing complex
225 instructions not available in the assembler instruction set. This
226 internal library is named libgcc, and must be linked to the glibc
227 library to be fully functional! Furthermore, the standard library for
228 C++ (libstdc++) also needs being linked to glibc. The solution to this
229 chicken and egg problem is to first build a degraded cc1 based libgcc,
230 lacking some functionalities such as threads and exception handling, then
231 build glibc using this degraded compiler (glibc itself is not
232 degraded), then build libstdc++. But this last library will lack the
233 same functionalities as libgcc.</para>
234
235 <para>This is not the end of the story: the conclusion of the preceding
236 paragraph is that cc1 is unable to build a fully functional libstdc++, but
237 this is the only compiler available for building the C/C++ libraries
238 during stage 2! Of course, the compiler built during stage 2, cc-lfs,
239 would be able to build those libraries, but (1) the build system of
240 GCC does not know that it is usable on pc, and (2) using it on pc
241 would be at risk of linking to the pc libraries, since cc-lfs is a native
242 compiler. So we have to build libstdc++ later, in chroot.</para>
243
244 </sect2>
245
246 <sect2 id="other-details">
247
248 <title>Other procedural details</title>
249
250 <para>The cross-compiler will be installed in a separate <filename
251 class="directory">$LFS/tools</filename> directory, since it will not
252 be part of the final system.</para>
253
254 <para>Binutils is installed first because the <command>configure</command>
255 runs of both GCC and Glibc perform various feature tests on the assembler
256 and linker to determine which software features to enable or disable. This
257 is more important than one might first realize. An incorrectly configured
258 GCC or Glibc can result in a subtly broken toolchain, where the impact of
259 such breakage might not show up until near the end of the build of an
260 entire distribution. A test suite failure will usually highlight this error
261 before too much additional work is performed.</para>
262
263 <para>Binutils installs its assembler and linker in two locations,
264 <filename class="directory">$LFS/tools/bin</filename> and <filename
265 class="directory">$LFS/tools/$LFS_TGT/bin</filename>. The tools in one
266 location are hard linked to the other. An important facet of the linker is
267 its library search order. Detailed information can be obtained from
268 <command>ld</command> by passing it the <parameter>--verbose</parameter>
269 flag. For example, <command>$LFS_TGT-ld --verbose | grep SEARCH</command>
270 will illustrate the current search paths and their order. It shows which
271 files are linked by <command>ld</command> by compiling a dummy program and
272 passing the <parameter>--verbose</parameter> switch to the linker. For
273 example,
274 <command>$LFS_TGT-gcc dummy.c -Wl,--verbose 2&gt;&amp;1 | grep succeeded</command>
275 will show all the files successfully opened during the linking.</para>
276
277 <para>The next package installed is GCC. An example of what can be
278 seen during its run of <command>configure</command> is:</para>
279
280<screen><computeroutput>checking what assembler to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/as
281checking what linker to use... /mnt/lfs/tools/i686-lfs-linux-gnu/bin/ld</computeroutput></screen>
282
283 <para>This is important for the reasons mentioned above. It also
284 demonstrates that GCC's configure script does not search the PATH
285 directories to find which tools to use. However, during the actual
286 operation of <command>gcc</command> itself, the same search paths are not
287 necessarily used. To find out which standard linker <command>gcc</command>
288 will use, run: <command>$LFS_TGT-gcc -print-prog-name=ld</command>.</para>
289
290 <para>Detailed information can be obtained from <command>gcc</command> by
291 passing it the <parameter>-v</parameter> command line option while compiling
292 a dummy program. For example, <command>gcc -v dummy.c</command> will show
293 detailed information about the preprocessor, compilation, and assembly
294 stages, including <command>gcc</command>'s included search paths and their
295 order.</para>
296
297 <para>Next installed are sanitized Linux API headers. These allow the
298 standard C library (Glibc) to interface with features that the Linux
299 kernel will provide.</para>
300
301 <para>The next package installed is Glibc. The most important
302 considerations for building Glibc are the compiler, binary tools, and
303 kernel headers. The compiler is generally not an issue since Glibc will
304 always use the compiler relating to the <parameter>--host</parameter>
305 parameter passed to its configure script; e.g. in our case, the compiler
306 will be <command>$LFS_TGT-gcc</command>. The binary tools and kernel
307 headers can be a bit more complicated. Therefore, we take no risks and use
308 the available configure switches to enforce the correct selections. After
309 the run of <command>configure</command>, check the contents of the
310 <filename>config.make</filename> file in the <filename
311 class="directory">build</filename> directory for all important details.
312 Note the use of <parameter>CC="$LFS_TGT-gcc"</parameter> (with
313 <envar>$LFS_TGT</envar> expanded) to control which binary tools are used
314 and the use of the <parameter>-nostdinc</parameter> and
315 <parameter>-isystem</parameter> flags to control the compiler's include
316 search path. These items highlight an important aspect of the Glibc
317 package&mdash;it is very self-sufficient in terms of its build machinery
318 and generally does not rely on toolchain defaults.</para>
319
320 <para>As said above, the standard C++ library is compiled next, followed in
321 <xref linkend="chapter-temporary-tools"/> by all the programs that need
322 themselves to be built. The install step of all those packages uses the
323 <envar>DESTDIR</envar> variable to have the
324 programs land into the LFS filesystem.</para>
325
326 <para>At the end of <xref linkend="chapter-temporary-tools"/> the native
327 lfs compiler is installed. First binutils-pass2 is built,
328 with the same <envar>DESTDIR</envar> install as the other programs,
329 then the second pass of GCC is constructed, omitting libstdc++
330 and other non-important libraries. Due to some weird logic in GCC's
331 configure script, <envar>CC_FOR_TARGET</envar> ends up as
332 <command>cc</command> when the host is the same as the target, but is
333 different from the build system. This is why
334 <parameter>CC_FOR_TARGET=$LFS_TGT-gcc</parameter> is put explicitly into
335 the configure options.</para>
336
337 <para>Upon entering the chroot environment in <xref
338 linkend="chapter-chroot-temporary-tools"/>, the first task is to install
339 libstdc++. Then temporary installations of programs needed for the proper
340 operation of the toolchain are performed. From this point onwards, the
341 core toolchain is self-contained and self-hosted. In
342 <xref linkend="chapter-building-system"/>, final versions of all the
343 packages needed for a fully functional system are built, tested and
344 installed.</para>
345
346 </sect2>
347
348</sect1>
Note: See TracBrowser for help on using the repository browser.