简体   繁体   English

Python:将行写入文件,并结合不同的类型

[英]Python: write line to file, that combine with different types

I have json, it contains next keys 我有json,它包含下一个键

[u'domain', u'_timestamp', u'meta_tags', u'author', u'title', u'url', u'tags', u'flow', u'link_tags', u'content', u'post_id', u'flags', u'polling', u'published', u'hubs', u'_id']

I need to writelines from it to .vw file. 我需要从它写行到.vw文件。 But some of them are numeric, and some string. 但是其中一些是数字的,还有一些字符串。 And I need to save this types. 我需要保存此类型。

Also I have file with values of target 我也有target文件

url     target
vk.com    0.934250

I use 我用

targets = train_target.target.values.tolist()
with open('train.json') as inp_json, \
 open('habr_train.vw', 'w') as out_vw:
    for i, line in enumerate(tqdm_notebook(inp_json)):
        data_json = json.loads(line)

        if data_json['flow'] is None and data_json['author']['nickname'] is None:
            res_line = str(targets[i]) + ' |title ' + data_json['title'] + ' |tags ' + ' '.join(data_json['tags']) \
              + ' |domain ' + data_json['domain'] + ' |flow None' + ' |author None' + ' |hubs ' + data_json['hubs'][0]['title'] + ' |num content_len:' + str(round(len(data_json['content']) / 1000000, 1)) + ' month:' + str(datetime.fromtimestamp(data_json['_timestamp']).month) + ' hour:' + str(datetime.fromtimestamp(data_json['_timestamp']).hour) + '\n'

        elif data_json['flow'] is None:
            res_line = str(targets[i]) + ' |title ' + data_json['title'] + ' |tags ' + ' '.join(data_json['tags']) \
              + ' |domain ' + data_json['domain'] + ' |flow None' + ' |author ' + data_json['author']['nickname'] + ' |hubs ' + data_json['hubs'][0]['title'] + ' |num content_len:' + str(round(len(data_json['content']) / 1000000, 1)) + ' month:' + str(datetime.fromtimestamp(data_json['_timestamp']).month) + ' hour:' + str(datetime.fromtimestamp(data_json['_timestamp']).hour) + '\n'

        elif data_json['author']['nickname'] is None:
            res_line = str(targets[i]) + ' |title ' + data_json['title'] + ' |tags ' + ' '.join(data_json['tags']) \
              + ' |domain ' + data_json['domain'] + ' |flow ' + data_json['flow'] + ' |author None' + ' |hubs ' + data_json['hubs'][0]['title'] + ' |num content_len:' + str(round(len(data_json['content']) / 1000000, 1)) + ' month:' + str(datetime.fromtimestamp(data_json['_timestamp']).month) + ' hour:' + str(datetime.fromtimestamp(data_json['_timestamp']).hour) + '\n'

        else:
            res_line = str(targets[i]) + ' |title ' + data_json['title'] + ' |tags ' + ' '.join(data_json['tags']) \
              + ' |domain ' + data_json['domain'] + ' |flow ' + data_json['flow'] + ' |author ' + data_json['author']['nickname'] + ' |hubs ' + data_json['hubs'][0]['title'] + ' |num content_len:' + str(round(len(data_json['content']) / 1000000, 1)) + ' month:' + str(datetime.fromtimestamp(data_json['_timestamp']).month) + ' hour:' + str(datetime.fromtimestamp(data_json['_timestamp']).hour) + '\n'

        out_vw.write(res_line.encode('utf-8'))

It works, but next I need to use library and it returns me error, that str(targets[i]) should be float. 它可以工作,但是接下来我需要使用库,并且返回错误,即str(targets [i])应该为float。

Is any way to save types of values? 有什么方法可以保存值的类型? How can I fix that? 我该如何解决?

instead of use the concatenate operator you can use format to avoid those kind of errors 可以使用format来避免此类错误,而不是使用连接运算符

example : 例如

res_line = '{0} |title {1} |tags {2} |domain {3} |flow None |author None |hubs {4} |num content_len: {5} month: {6} hour: {7}\n'.format(str(targets[i]),data_json['title'], ' '.join(data_json['tags']), data_json['domain'], data_json['hubs'][0]['title'], str(datetime.fromtimestamp(data_json['_timestamp']).month), str(datetime.fromtimestamp(data_json['_timestamp']).hour))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM