My data is a list of tuples :
def find_ngrams(verbatims, n):
return zip(*[verbatims[i:] for i in range(n)])
bigrams = find_ngrams(verbatims, 4)
print bigrams
[((u'a', u'grossir', u'et', u'a'), 74), ((u'un', u'avis', u'de', u'passage'), 68), ((u'le', u'facteur', u'est', u'pass\\xe9'), 67), ((u'V\\xeatements', u'+', u'ou', u'-'), 63), ((u'+', u'ou', u'-', u'similaires'), 62), ((u'vous', u'ne', u'pouvez', u'pas'), 54), ((u'sinon', u'une', u'petite', u'recherche'), 53)]
ordered and counted using the Counter().most_common() method :
ngrams = Counter(bigrams).most_common()
FIY I am doing an n-gram analysis on a large text data. For n-gram information : https://en.wikipedia.org/wiki/N-gram I have a cool dataframe in pandas :
DF = pandas.DataFrame(ngrams)
DF.columns = ['ngram','occurence']
print DF
ngram occurence
0 (a, grossir, et, a) 74
1 (un, avis, de, passage) 68
2 (le, facteur, est, passé) 67
Except that my n-grams are enclosed with brackets, and I don't want that. I know I could use a basic search/replace method, but I want to do it in a more computer science, logic way. Plus if I do a search/replace, I could lose some brackets from inside of my text.
I'm not sure what is the exact problem here but I guess it has to do with the nested tuples inside of my list. So how do I take a list of tuples to a dataframe without having brackets ?
edit : as requested, here is my expected output :
ngram occurence
0 a, grossir, et, a 74
1 un, avis, de, passage 68
2 le, facteur, est, passé 67
Thanks,
You are not simply looking for:
In [309]: pd.DataFrame([(','.join(el[0]), el[1]) for el in bigrams])
Out[309]:
0 1
0 a,grossir,et,a 74
1 un,avis,de,passage 68
2 le,facteur,est,passé 67
3 Vêtements,+,ou,- 63
4 +,ou,-,similaires 62
5 vous,ne,pouvez,pas 54
6 sinon,une,petite,recherche 53
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.