简体   繁体   English

Pythonic方式有条件地迭代列表中的项目

[英]Pythonic way to conditionally iterate over items in a list

New to programming in general, so I'm probably going about this the wrong way. 一般来说是编程新手,所以我可能会以错误的方式解决这个问题。 I'm writing an lxml parser where I want to omit HTML table rows that have no content from the parser output. 我正在编写一个lxml解析器,我想省略没有解析器输出内容的HTML表行。 This is what I've got: 这就是我所拥有的:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        sys.stdout.write(cell.text_content() + '\t')
    sys.stdout.write '\n'

The write() stuff is temporary. write()东西是临时的。 What I want is for the loop to only return rows where tr.text_content != '' . 我想要的是循环只返回tr.text_content != '' So I guess I'm asking how to write what my brain thinks should be 'for a in b if a != x' but that doesn't work. 所以我想我要问的是如何写出我的大脑认为应该是'如果a!= x',那么这是不行的。

Thanks! 谢谢!

for row in doc.cssselect('tr'):
    cells = [ cell.text_content() for cell in row.cssselect('td') ]
    if any(cells):
        sys.stdout.write('\t'.join(cells) + '\n')

prints the line only if there is at least one cell with text content. 仅当至少有一个包含文本内容的单元格时才打印该行。

ReEdit : ReEdit

You know, I really don't like my answer at all. 你知道,我真的不喜欢我的回答。 I voted up the other answer but I liked his original answer because not only was it clean but self explanatory without getting "fancy" which is what I fell victim to: 我投了另一个答案,但我喜欢他原来的答案,因为它不仅是干净而且是自我解释而没有得到“幻想”,这是我成为受害者:

for row in doc.cssselect('tr'):
    for cell in row.cssselect('td'):
        if(cel.text_content() != ''):
            #do stuff here

there's not much more of an elegant solution. 没有更多优雅的解决方案。

Original-ish : Original-ish

You can transform the second for loop as follows: 您可以按如下方式转换第二个for循环:

[cell for cell in row.cssselect if cell.text_content() != '']

and turn it into a list-comprehension. 并将其转化为列表理解。 That way you've got a prescreened list. 这样你就有了预先筛选的清单。 You can take that even farther by looking at the following example: 通过查看以下示例,您可以更进一步:

a = [[1,2],[2,3],[3,4]
newList = [y for x in a for y in x]

which transforms it into [1, 2, 2, 3, 3, 4] . 它将其转换为[1, 2, 2, 3, 3, 4] Then you can add in the if statement at the end to screen out values. 然后,您可以在末尾添加if语句以筛选出值。 Hence, you'd reduce that into a single line. 因此,您可以将其减少为一行。

Then again, if you were to look at itertools : 再说一次,如果你要看一下itertools

ifilter(lambda x: x.text_content() != '', row.cssselect('td'))

produces an iterator which you can iterate over, skipping all items you don't want. 生成一个迭代器,你可以迭代,跳过你不想要的所有项目。

Edit : 编辑

And before I get more downvotes, if you're using python 3.0, filter works the same way. 在我获得更多downvotes之前,如果你使用python 3.0, filter工作方式相同。 No need to import ifilter . 无需导入ifilter

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM