简体   繁体   English

使用strip()删除空白

[英]remove white space using strip()

how can i remove the [u'\\n\\n\\n result here \\n\\n\\n'] and get have a result as [u'result here'] only... I am using scrapy 我怎样才能删除[u'\\n\\n\\n result here \\n\\n\\n']并只得到[u'result here']的结果...我正在使用scrapy

def parse_items(self, response):
  str = ""
  hxs = HtmlXPathSelector(response)

  for titles in titles:
      item = CraigslistSampleItem()
      item ["job_id"] = (id.select('text()').extract() #ok
      items.append(item)
  return(items)
end

can anyone help me? 谁能帮我?

id.select('text()').extract() 

returns a list of string containing your text. 返回包含您的文本的字符串列表。 You should either iterate over that list to strip each item or use slicing eg your_list[0].strip() to perform striping white spaces. 您应该遍历该列表以剥离每个项目,或者使用切片例如your_list [0] .strip()进行剥离空白。 Strip method is actually associated with string data types. Strip方法实际上与字符串数据类型相关联。

def parse_items(self, response):
  str = ""
  hxs = HtmlXPathSelector(response)

  for titles in titles:
      item = CraigslistSampleItem()
      item ["job_id"] = id.select('text()').extract()[0].strip() #this should work if #there is some string data available. otherwise it will give an index out of range error.
      items.append(item)
  return(items)
end

Alternative to using Python's .strip() 替代使用Python的.strip()

You can use XPath function normalize-space() around your XPath expression that selects "job_id": 您可以在选择“ job_id”的XPath表达式周围使用XPath函数normalize-space()

def parse_items(self, response):
    hxs = HtmlXPathSelector(response)

    for titles in titles:
        item = CraigslistSampleItem()
        item ["job_id"] = title.select('normalize-space(.//td[@scope="row"])').extract()[0].strip()
        items.append(item)
    return(items)

Note 1 : the XPath expression I use is based on https://careers-cooperhealth.icims.com/jobs/search?ss=1&searchLocation=&searchCategory=&hashed=0 注意1 :我使用的XPath表达式基于https://careers-cooperhealth.icims.com/jobs/search?ss=1&searchLocation=&searchCategory=&hashed=0

Note 2 on the answer using .strip() : with id.select('text()').extract()[0].strip() you get u'result here' , not a list. 注解2中使用.strip()的答案 :与id.select('text()').extract()[0].strip()您得到的u'result here' ,而不是列表。

That may very well be what you need, but if you want to keep the list, as you asked to remove [u'\\n\\n\\n result here \\n\\n\\n'] and get have a result as [u'result here'] , you can use something like this, using Python's map() : 可能正是您所需要的,但是如果要保留列表,则要求删除[u'\\n\\n\\n result here \\n\\n\\n']并得到结果为[u'result here'] ,您可以使用Python的map()使用类似的内容:

item ["job_id"] = map(unicode.strip, id.select('text()').extract())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM