lxml.etree 教程5：Using XPath to find text-白红宇

lxml.etree 教程5：Using XPath to find text

阅读量：7022 次

发布时间：2019-06-28

本文共 756 字，大约阅读时间需要 2 分钟。

另外一个获取树里面文本内容的方法是XPath，它一样可以把文本内容提取到列表中。

>>> print(html.xpath("string()")) # lxml.etree only!TEXTTAIL>>> print(html.xpath("//text()")) # lxml.etree only!['TEXT', 'TAIL']

如果你比较频繁使用这个方式，可以包装成一个函数。

>>> build_text_list = etree.XPath("//text()") # lxml.etree only!>>> print(build_text_list(html))['TEXT', 'TAIL']

注意到一个XPath返回的字符串结果是一个特殊的“聪明”的对象，它知道它来自何处。

>>> texts = build_text_list(html)>>> print(texts[0])TEXT>>> parent = texts[0].getparent()>>> print(parent.tag)body>>> print(texts[1])TAIL>>> print(texts[1].getparent().tag)br

你也可以找出它常规的文本内容或尾部文本:

>>> print(texts[0].is_text)True>>> print(texts[1].is_text)False>>> print(texts[1].is_tail)True

>>> stringify = etree.XPath("string()")>>> print(stringify(html))TEXTTAIL>>> print(stringify(html).getparent())None

转载地址：http://pcbxl.baihongyu.com/

你可能感兴趣的文章

linux 查看进程“打开”的文件或文件对应的进程及网络状态

查看>>

【AndroidSupport】RoundedBitmapDrawable 创建一个圆角图片

查看>>

9月27日28家中国域名商六类国际域名注册保有量统计

如何在阿里云•对象存储OSS托管用户域名的https证书

查看>>

FreeMarker速查手册

查看>>

929. Unique Email Address - LeetCode

查看>>

Cisco 3560 级联端口不通的原因

查看>>

Zend Studio 无法打开的解决办法

查看>>

IEEE 802.3标准

查看>>

[安装程序]HDDScan(硬盘坏道检测工具) 3.3[详细检测你的硬盘信息]

用Java集合中的Collections.sort方法对list排序的两种方法

查看>>

利用mysql的注射点得到更多mysql的信息

error: no valid connection

查看>>

一句话说明数组和集合的区别

查看>>