使用strip（）删除空白

Question

how can i remove the [u'\\n\\n\\n result here \\n\\n\\n'] and get have a result as [u'result here'] only... I am using scrapy 我怎样才能删除[u'\\n\\n\\n result here \\n\\n\\n']并只得到[u'result here']的结果...我正在使用scrapy

def parse_items(self, response):
  str = ""
  hxs = HtmlXPathSelector(response)

  for titles in titles:
      item = CraigslistSampleItem()
      item ["job_id"] = (id.select('text()').extract() #ok
      items.append(item)
  return(items)
end

can anyone help me? 谁能帮我？

Answer 1

id.select('text()').extract()

returns a list of string containing your text. 返回包含您的文本的字符串列表。 You should either iterate over that list to strip each item or use slicing eg your_list[0].strip() to perform striping white spaces. 您应该遍历该列表以剥离每个项目，或者使用切片例如your_list [0] .strip（）进行剥离空白。 Strip method is actually associated with string data types. Strip方法实际上与字符串数据类型相关联。

def parse_items(self, response):
  str = ""
  hxs = HtmlXPathSelector(response)

  for titles in titles:
      item = CraigslistSampleItem()
      item ["job_id"] = id.select('text()').extract()[0].strip() #this should work if #there is some string data available. otherwise it will give an index out of range error.
      items.append(item)
  return(items)
end

Answer 2

Alternative to using Python's .strip() 替代使用Python的.strip()

You can use XPath function normalize-space() around your XPath expression that selects "job_id": 您可以在选择“ job_id”的XPath表达式周围使用XPath函数normalize-space() ：

def parse_items(self, response):
    hxs = HtmlXPathSelector(response)

    for titles in titles:
        item = CraigslistSampleItem()
        item ["job_id"] = title.select('normalize-space(.//td[@scope="row"])').extract()[0].strip()
        items.append(item)
    return(items)

Note 1 : the XPath expression I use is based on https://careers-cooperhealth.icims.com/jobs/search?ss=1&searchLocation=&searchCategory=&hashed=0 注意1 ：我使用的XPath表达式基于https://careers-cooperhealth.icims.com/jobs/search?ss=1&searchLocation=&searchCategory=&hashed=0

Note 2 on the answer using .strip() : with id.select('text()').extract()[0].strip() you get u'result here' , not a list. 注解2中使用.strip()的答案 ：与id.select('text()').extract()[0].strip()您得到的u'result here' ，而不是列表。

That may very well be what you need, but if you want to keep the list, as you asked to remove [u'\\n\\n\\n result here \\n\\n\\n'] and get have a result as [u'result here'] , you can use something like this, using Python's map() : 可能正是您所需要的，但是如果要保留列表，则要求删除[u'\\n\\n\\n result here \\n\\n\\n']并得到结果为[u'result here'] ，您可以使用Python的map()使用类似的内容：

item ["job_id"] = map(unicode.strip, id.select('text()').extract())

使用strip（）删除空白

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-08-28 06:14:46

解决方案2
3 2013-08-28 07:42:58

使用strip（）删除空白

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-08-28 06:14:46

解决方案2 3 2013-08-28 07:42:58

解决方案1
3 已采纳 2013-08-28 06:14:46

解决方案2
3 2013-08-28 07:42:58