简体   繁体   English

图形化使用readHTMLTable读取的数据

[英]Graphing data that is read using readHTMLTable

I want to read the following table , from a webpage then create a bargraph. 我想从网页中阅读下表,然后创建条形图。

Language............ Jobs 语言............工作

PHP.................... 12,664 PHP ....................... 12,664

Java................... 12,558 Java ........ 12,558

Objective C......... 8,925 目标C ......... 8,925

SQL.................... 5,165 SQL ................... 5,165

Android (Java).... 4,981 Android(Java)。。。4,981

Ruby................... 3,859 红宝石...................... 3,859

JavaScript........... 3,742 JavaScript ........... 3,742

C#....................... 3,549 C#........................ 3,549

C++..................... 1,908 C ++ ..................... 1,908

ActionScript......... 1,821 动作脚本......... 1,821

Python................. 1,649 蟒蛇........ 1,649

C.......................... 1,087 C .......................... 1,087

ASP.NET............... 818 ASP.NET ............... 818

My questions: 我的问题:

1.The problem that my bars get messed up and each bar does correspond to the correct language The following is my code: 1.我的酒吧弄乱了,每个酒吧的确对应正确的语言,这是我的代码:

library(XML)
tables2 <-(readHTMLTable("http://www.sitepoint.com/best-programming-language-of-2013/",which=1))
barplot(as.numeric(tables2$Job),names.arg=tables2$Language)
  1. Since I am a beginner at RI would like to know in what format does readHTMLTable save the data in? 由于我是RI的初学者,所以我想知道readHTMLTable以哪种格式保存数据? is it a matrix, data frame or other format? 它是矩阵,数据框还是其他格式?

The main problem here is that Jobs is being read as a factor . 这里的主要问题是Jobs正在被阅读为一个factor Because of the commas in that field, you can't do a direct numeric conversion. 由于该字段中的逗号,您无法进行直接数字转换。 You can find out what 'format' your object is in R by doing str() . 您可以通过执行str()来找出对象在R中的“格式”。 Here str(tables2) gives: 在这里, str(tables2)给出:

'data.frame':   13 obs. of  2 variables:
 $ Language: Factor w/ 13 levels "ActionScript",..: 10 7 9 13 2 12 8 5 6 1 ...
 $ Jobs    : Factor w/ 13 levels "1,087","1,649",..: 6 5 12 11 10 9 8 7 4 3 ...

So you can see Jobs is a factor, and that tables2 is a data.frame . 因此,您可以看到Jobs是一个因素,而tables2data.frame To convert it to numeric you need to remove the commas. 要将其转换为数字,您需要删除逗号。 You can do that with gsub() . 您可以使用gsub()做到这一点。

tables2$Jobs <- as.numeric(gsub(",","",tables2$Jobs))

No str(tables2) gives: 没有str(tables2)给出:

'data.frame':   13 obs. of  2 variables:
 $ Language: Factor w/ 13 levels "ActionScript",..: 10 7 9 13 2 12 8 5 6 1 ...
 $ Jobs    : num  12664 12558 8925 5165 4981 ...

and when you do your plot, all should be well: 当你做图时,一切都应该很好:

barplot(tables2$Jobs,names.arg=tables2$Language)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM