我如何在python中使用split函数拆分文本部分并将其保存到其他文件？

Question

您好，我在python中使用split函数时遇到问题，但没有成功。 我使用搜寻器收集了一些推文，我需要将每个推文的某些部分拆分到一个不同的.json文件中，尤其是ID和＃（＃hashtag）。 我一直在使用split函数，但没有成功，我在做什么错呢？ 我想将“ id”和“ text”之后的内容保存到另一个.json文件中
文本如下所示：

{“ created_at”：“周五10月20日16:35:36 +0000 2017”，“ id”：921414607302025216，“ id_str”：“ 921414607302025216”，“ text”：“ @ IdrisAhmed16 loooooool，谁说过我在指导您？？？

def on_data(self, data):
    try:
        #print data
        with open('Bologna_streams.json', 'r') as f:
            for line in f:

                tweet = data.spit(',"text":"')[1].split('",""source"')[0]
                print (tweet)

                saveThis = str(time.time()) + '::' +tweet

                saveFile = open('Bologna_text_preprocessing.json', 'w')
                json.dump(data)
                saveFile.write(saveThis)
                saveFile.write(tweet)
                saveFile.write('\n')
                saveFile.close()
                f.close()
        return True
    except BaseException as e:
        print("Error on_data: %s" % str(e))
        time.sleep(5)

def on_error(self, status):
    print (status)

Answer 1

我认为您应该以交互方式或在小的脚本中在命令行上试验Python。

考虑一下：

text="""
{"created_at":"Fri Oct 20 16:35:36 +0000 2017","id":921414607302025216,"id_str":"921414607302025216","text":"@IdrisAhmed16 learn #python"}
""".strip()

print(text.split(":"))

这将在控制台中打印：

['{"created_at"', '"Fri Oct 20 16', '35', '36 +0000 2017","id"', '921414607302025216,"id_str"', '"921414607302025216","text"', '"@IdrisAhmed16 learn #python"}']

或者，在新行上打印每个拆分屏幕：

print("splits:\n")
for item in text.split(":"):
  print(item)
print("\n---")

它将打印此：

splits:

{"created_at"
"Fri Oct 20 16
35
36 +0000 2017","id"
921414607302025216,"id_str"
"921414607302025216","text"
"@IdrisAhmed16 #learn python"}

---

换句话说， split已完成应做的工作：找到每个":"并将字符串拆分为这些字符。

您要做的是解析JSON：

import json

parsed = json.loads(text)
print("parsed:", parsed)

parsed变量是普通的Python对象。 结果：

parsed: {
  'created_at': 'Fri Oct 20 16:35:36 +0000 2017',
  'id': 921414607302025216,
  'id_str': '921414607302025216',
  'text': '@IdrisAhmed16 learn #python'
}

现在，您可以对数据进行操作，包括检索text项并将其拆分。

但是，如果目标是查找所有主题标签，则最好使用正则表达式：

import re
hashtag_pattern = re.compile('#(\w+)')
matches = hashtag_pattern.findall(parsed['text'])
print("All hashtags in tweet:", matches)

print("Another example:", hashtag_pattern.findall("ok #learn #python #stackoverflow!"))

结果：

All hashtags in tweet: ['python']
Another example: ['learn', 'python', 'stackoverflow']

我如何在python中使用split函数拆分文本部分并将其保存到其他文件？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-11-30 16:40:42

我如何在python中使用split函数拆分文本部分并将其保存到其他文件？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-11-30 16:40:42

解决方案1
1 已采纳 2017-11-30 16:40:42