[英]How do I use Twitter4J to retrieve images in tweets?
I want to issue a query of a keyword or hashtag and retrieve all the images from all the tweets that contain the keyword. 我想发出关键字或主题标签的查询,并从包含该关键字的所有推文中检索所有图像。 I can use Twitter4J with Java to easily issue a query and retrieve the resulting tweets. 我可以使用带有Java的Twitter4J轻松发出查询并检索生成的推文。 I know that the http://t.co/xxxx
links I can visit in my browser and see the associated image. 我知道http://t.co/xxxx
链接我可以在浏览器中访问并查看相关图像。 That image is at https://pbs.twimg.com/xxxxx
. 该图片位于https://pbs.twimg.com/xxxxx
。 So seems like all I have to do is that process in my code! 所以我需要做的就是在我的代码中完成这个过程!
I can parse the http://t.co/xxxx
link in each tweet easily enough. 我可以很容易地解析每条推文中的http://t.co/xxxx
链接。 However, when I retrieve all the html from that link, I don't see any https://pbs.twimg.com/xxxx
images :(. I think what's happening is twitter is loading those images through JavaScript. 但是,当我从该链接检索所有html时,我没有看到任何https://pbs.twimg.com/xxxx
图像:(。我认为正在发生的事情是Twitter正在通过JavaScript加载这些图像。
Is there any way I can easily retrieve the images on each tweet?? 有什么方法可以轻松检索每条推文上的图像吗?
This is what I have so far: 这是我到目前为止:
package com.company;
import twitter4j.*;
import twitter4j.conf.ConfigurationBuilder;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) throws Exception {
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey("xxxxxxxxxx")
.setOAuthConsumerSecret("xxxxxxxxxxxx")
.setOAuthAccessToken("xxxxxxxxx-xxx-xxxxxxxx")
.setOAuthAccessTokenSecret("xxxxxxxxxxxxxxxxxxx");
TwitterFactory tf = new TwitterFactory(cb.build());
Twitter twitter = tf.getInstance();
Query query = new Query("#hashtag");
QueryResult result = twitter.search(query);
Pattern pattern = Pattern.compile("http://t.co/\\w{10}");
Pattern imagePattern = Pattern.compile("https\\:\\/\\/pbs\\.twimg\\.com/media/\\w+\\.(png | jpg | gif)(:large)?");
for (Status status : result.getTweets()) {
if (status.isRetweet())
continue;
System.out.println("@" + status.getUser().getScreenName() + ":" + status.getText());
Matcher matcher = pattern.matcher(status.getText());
if (matcher.find()) {
System.out.println("found a t.co url");
URL oracle = new URL(matcher.group());
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
matcher = imagePattern.matcher(inputLine);
if (matcher.find())
System.out.println("YAYAAYAYAYYAYAYAYAYAYAYAYAYAAYAYYAYAAYYAYAYAYA: " + matcher.group());
}
in.close();
}
}
}
}
There is a simpler way to retrieve images in tweets. 有一种更简单的方法来检索推文中的图像。
If a tweet has an image inserted you can use getMediaEntities()
to get the data of the media, and then retrieve the url with getMediaURL()
如果推文插入了图像,您可以使用getMediaEntities()
获取媒体数据,然后使用getMediaURL()
检索网址
You should do something like this 你应该做这样的事情
MediaEntity[] media = status.getMediaEntities(); //get the media entities from the status
for(MediaEntity m : media){ //search trough your entities
System.out.println(m.getMediaURL()); //get your url!
}
to download all medias in the twitter4J Status 在twitter4J状态下载所有媒体
for (MediaEntity m : medias) {
try {
URL url = new URL(m.getMediaURL());
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
int n = 0;
while (-1 != (n = in.read(buf))) {
out.write(buf, 0, n);
}
out.close();
in.close();
byte[] response = out.toByteArray();
FileOutputStream fos = new FileOutputStream(file.getAbsolutePath() + "\\" + m.getId() + "." + getExtension(m.getType()));
fos.write(response);
fos.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
to get the file extension 获取文件扩展名
private String getExtension(String type) {
if (type.equals("photo")) {
return "jpg";
} else if (type.equals("video")) {
return "mp4";
} else if (type.equals("animated_gif")) {
return "gif";
} else {
return "err";
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.