简体   繁体   中英

Python conditional list joins

I have a list that looks like this:

[
  'A',
  'must',
  'see',
  'is',
  'the',
  'Willaurie',
  ',',
  'which',
  'sank',
  'after', 
  'genoegfuuu',
  'damaged',
  'in',
  'a',
  'storm',
  'in',
  '1989',
  '.'
]

As you can see, there is punctuation. I want to call .join using a blank space except for the cases where the string is punctuation, then I don't want a separator.

What's the best way to do this?
I've been trying for a while and my solutions are getting way too complicated for what seems like a simple task.

Thanks

The string module has a list containing all punctuation characters.

import string
string = ''.join([('' if c in string.punctuation else ' ')+c for c in wordlist]).strip()

You have your answer already, but just would like to add, that not all punctuation should be stuck to a left-hand side. If you want to deal with more general sentences, you could have for example parentheses or apostrophes, and you don't want to end up with something like:

It' sa great movie( best I' ve seen)

I'd say it's pointless to create some nasty one-liner, just to do this in most pythonic way. If you don't need super fast solution, you could consider solving it step-by-step, for example:

import re
s = ['It', "'", 's', 'a', 'great', 'movie', 
     '(', 'best', 'I', "'", 've', 'seen', ')']

s = " ".join(s) # join normally
s = re.sub(" ([,.;\)])", lambda m: m.group(1), s) # stick to left
s = re.sub("([\(]) ", lambda m: m.group(1), s)    # stick to right
s = re.sub(" ([']) ", lambda m: m.group(1), s)    # join both sides

print s # It's a great movie (best I've seen)

It's pretty flexible and you can specify which punctuation is handled by each rule... It has 4 lines though, so you might dislike it. No matter which method you choose, there'll be probably some sentences that won't work correctly and need special case, so one-liner may be just a bad choice anyway.

EDIT: Actually, you can contract the above solution to one line, but as said before, I'm pretty sure there are more cases to consider:

print re.sub("( [,.;\)]|[\(] | ['] )", lambda m: m.group(1).strip(), " ".join(s))
>>> ''.join([('' if i in set(",.!?") else ' ') + i for i in words]).strip()
'A must see is the Willaurie, which sank after genoegfuuu damaged in a storm in 1989.'

像这样

re.sub(r'\s+(?=\W)', '', ' '.join(['A', 'must', 'see', 'is', 'the', 'Willaurie', ',', 'which', 'sank', 'after', 'genoegfuuu', 'damaged', 'in', 'a', 'storm', 'in', '1989', '.']))

How about using filter?

words = ['A', 'must', 'see', 'is', 'the', 'Willaurie', ',', 'which', 'sank', 'after', 'genoegfuuu', 'damaged', 'in', 'a', 'storm', 'in', '1989', '.']
' '.join(filter(lambda x: x not in string.punctuation, words))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM