简体   繁体   English

如Twitter4J代码示例中那样,使用TwitterStream采样推文是否正常,我主要得到的问号是用户名和状态?

[英]Is it normal that sampling tweets using TwitterStream as in Twitter4J code example, I get just mainly question marks as user name and status?

I used the code as in the section "code example" in Twitter4j: 我使用了Twitter4j中“代码示例”部分中的代码:

public static void main(String[] args) throws TwitterException, IOException{
    StatusListener listener = new StatusListener(){
        public void onStatus(Status status) {
            System.out.println(status.getUser().getName() + " : " + status.getText());
        }
        public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {}
        public void onTrackLimitationNotice(int numberOfLimitedStatuses) {}
        public void onException(Exception ex) {
            ex.printStackTrace();
        }
    };
    TwitterStream twitterStream = new TwitterStreamFactory().getInstance();
    twitterStream.addListener(listener);
    // sample() method internally creates a thread which manipulates TwitterStream and calls these adequate listener methods continuously.
    twitterStream.sample();
}

As you can see, there's a println in the code above, inside the method "onStatus". 如您所见,在上面的代码中,在“ onStatus”方法内部有一个println。 The following photo shows what I get mainly from that code. 下图显示了我主要从该代码中获得的信息。 Is it normal? 正常吗

question marks...question marks everywhere 问号...到处都是问号

Indeed, i I filter just statuses whose user hasn't got a question mark in his user name, I got almost nothing. 的确,我只过滤用户名中没有问号的状态,我几乎什么也没有。 Moreover, I should also filter users whose location is public. 此外,我还应该过滤公开位置的用户。 With regards to that I also ask what is the difference between: 关于这一点,我也想问一下两者之间有什么区别?

user.isGeoEnabled()

and

user.getLocation() != ""

The responses you will get back are UTF-8 encoded https://dev.twitter.com/tags/utf-8 您将获得的响应是​​UTF-8编码的https://dev.twitter.com/tags/utf-8

If you look at some of the accounts in the output they include non-western european characters https://twitter.com/tomokichi_koyo . 如果您查看输出中的某些帐户,它们将包含非西欧字符https://twitter.com/tomokichi_koyo These are breaking the output. 这些正在破坏输出。

Try writing to a file instead and opening with a UTF-8 aware editor. 尝试改为写入文件,然后使用支持UTF-8的编辑器打开。 There are various answers about setting up java and your OS to default to UTF-8 but you will need to look for you specific combination https://stackoverflow.com/search?q=windows+console+java+utf-8 关于将Java和操作系统设置为默认为UTF-8的方法有多种答案,但是您需要查找特定的组合https://stackoverflow.com/search?q=windows+console+java+utf-8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM