简体   繁体   中英

coverting strings to tuple in python3.5

But now I wish to convert it to the following tuple format:

((1231, 123), (2341, 1210), (342,12), (5462, 565))

I really need to find a way to convert this data to the format directly above. I would greatly appreciate any help!

How to covert a string into pairs of tuple? I have already tried this

with open("data.txt") as f:
    list = [line.rstrip('\n') for line in f] 
    mylist = [mylist[x:x+1] for x in range(0, len(mylist), 3)]
    print(mylist)


data = ['I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money']

I want my output to be in this format

[('i', 'went'),('to', 'work'),('but', 'got').........]

I have tried this but not working


import itertools
import nltk
import collections
f=open('readme.txt','r')
data=f.read()
print(data)
d1 = data[0].split() 
output = list(itertools.zip_longest(d1[::2],d1[1::2],fillvalue = None)) 
print(output)

Edited from comment - File content:

['भिन्केन NNP डच NNP प्रकाशन NN समूह NN एल्सेभियर NNP एन.भी. FB को PKO अध्यक्ष NN हुनुहुन्छ VBF । YF कन्सोलिडेटिड NNP गोल्ड NN फिल्ड्स NN पीएलसी NNP का PKO पूर्व JJ सभापति NN ५५ CD वर्षीय JJ रूडोल्फ NNP अग्न्यु NNP लाई PLAI यस DUM ब्रिटिस NNP औद्योगिक JJ समूह NN को PKO सल्लाहकार NN को PKO रूप NN मा POP मनोनयन NN गरिएको VBKO थियो VBX । YF एकताका RBO केन्ट NNP चुरोट NN को PKO फिल्टर NN बनाउन VBI प्रयोग NN भएको VBKO एक CD प्रकार NN को PKO अस्बेस्टोस NNP '] 

You can use itertools.zip_longest wich also works for zipping unevenly lengthy list by supplying a default value (of None if not otherwise specified) to the shorter lists:

You split data at spaces and feed a sublists to zip : once starting at 0 and once starting at 1, both using every other (2nd) element only:

data = ['I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money']

import itertools
d1 = data[0].split() 

# use 2 partial lists, using every 2nd word, once staring at 0, once at 1
# you can change   fillvalue=None   to some other value or remove it - None is the default.
output = list(itertools.zip_longest(d1[::2],d1[1::2], fillvalue = None)) 

print(output)

Output:

[('I', 'went'), ('to', 'work'), ('but', 'got'), ('delayed', 'at'), ('other', 'work'), 
 ('and', 'got'), ('stuck', 'in'), ('a', 'traffic'), ('and', 'I'), ('went', 'to'), 
 ('drink', 'some'), ('coffee', 'but'), ('got', 'no'), ('money', 'and'), 
 ('asked', 'for'), ('money', None)]

The sublists fed to zip_longest look like:

print(d1[::2])

['I', 'to', 'but', 'delayed', 'other', 'and', 'stuck', 'a', 'and', 'went', 'drink', 
 'coffee', 'got', 'money', 'asked', 'money']

and

print(d1[1::2])

['went', 'work', 'got', 'at', 'work', 'got', 'in', 'traffic', 'I', 'to', 'some', 
 'but', 'no', 'and', 'for']

The following part is adapted from Convert string representation of list to list

# -*- coding: utf-8 -*-

import ast

# create your file as utf8
with open("myfile.txt","w", encoding="utf8") as f:
    f.write("['भिन्केन NNP डच NNP प्रकाशन NN समूह NN एल्सेभियर NNP एन.भी. FB को PKO अध्यक्ष NN हुनुहुन्छ VBF । YF कन्सोलिडेटिड NNP गोल्ड NN फिल्ड्स NN पीएलसी NNP का PKO पूर्व JJ सभापति NN ५५ CD वर्षीय JJ रूडोल्फ NNP अग्न्यु NNP लाई PLAI यस DUM ब्रिटिस NNP औद्योगिक JJ समूह NN को PKO सल्लाहकार NN को PKO रूप NN मा POP मनोनयन NN गरिएको VBKO थियो VBX । YF एकताका RBO केन्ट NNP चुरोट NN को PKO फिल्टर NN बनाउन VBI प्रयोग NN भएको VBKO एक CD प्रकार NN को PKO अस्बेस्टोस NNP ']")

# load your file, using utf8
with open("myfile.txt","r",encoding="utf8") as f:
    data = f.read()
# convert the loaded string literal into a python list    
dataAsList = ast.literal_eval(data)

print(dataAsList)
print(type(dataAsList))

import itertools
d1 = dataAsList[0].split() 

# use 2 partial lists, using every 2nd word, once staring at 0, once at 1
# you can change   fillvalue=None   to some other value or remove it - None is the default.
output = list(itertools.zip_longest(d1[::2],d1[1::2], fillvalue = None)) 

print(p)

Output:

['भिन्केन NNP डच NNP प्रकाशन NN समूह NN एल्सेभियर NNP एन.भी. FB को PKO अध्यक्ष NN हुनुहुन्छ VBF । YF कन्सोलिडेटिड NNP गोल्ड NN फिल्ड्स NN पीएलसी NNP का PKO पूर्व JJ सभापति NN ५५ CD वर्षीय JJ रूडोल्फ NNP अग्न्यु NNP लाई PLAI यस DUM ब्रिटिस NNP औद्योगिक JJ समूह NN को PKO सल्लाहकार NN को PKO रूप NN मा POP मनोनयन NN गरिएको VBKO थियो VBX । YF एकताका RBO केन्ट NNP चुरोट NN को PKO फिल्टर NN बनाउन VBI प्रयोग NN भएको VBKO एक CD प्रकार NN को PKO अस्बेस्टोस NNP ']

<class 'list'>

[('भिन्केन', 'NNP'), ('डच', 'NNP'), ('प्रकाशन', 'NN'), ('समूह', 'NN'), 
 ('एल्सेभियर', 'NNP'), ('एन.भी.', 'FB'), ('को', 'PKO'), ('अध्यक्ष', 'NN'), 
 ('हुनुहुन्छ', 'VBF'), ('।', 'YF'), ('कन्सोलिडेटिड', 'NNP'), ('गोल्ड', 'NN'), 
 ('फिल्ड्स', 'NN'), ('पीएलसी', 'NNP'), ('का', 'PKO'), ('पूर्व', 'JJ'), 
 ('सभापति', 'NN'), ('५५', 'CD'), ('वर्षीय', 'JJ'), ('रूडोल्फ', 'NNP'), 
 ('अग्न्यु', 'NNP'), ('लाई', 'PLAI'), ('यस', 'DUM'), ('ब्रिटिस', 'NNP'), 
 ('औद्योगिक', 'JJ'), ('समूह', 'NN'), ('को', 'PKO'), ('सल्लाहकार', 'NN'), 
 ('को', 'PKO'), ('रूप', 'NN'), ('मा', 'POP'), ('मनोनयन', 'NN'), 
 ('गरिएको', 'VBKO'), ('थियो', 'VBX'), ('।', 'YF'), ('एकताका', 'RBO'), 
 ('केन्ट', 'NNP'), ('चुरोट', 'NN'), ('को', 'PKO'), ('फिल्टर', 'NN'), 
 ('बनाउन', 'VBI'), ('प्रयोग', 'NN'), ('भएको', 'VBKO'), ('एक', 'CD'), 
 ('प्रकार', 'NN'), ('को', 'PKO'), ('अस्बेस्टोस', 'NNP')] 
splittedData = data[0].split(' ')
counter = len(splittedData)
if counter%2 == 0:
  pass
else:
  counter += 1
output_list= []
for x in range(counter/2):
  output_list.append((splittedData[x], splittedData[x+1]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM