![](/img/trans.png)
[英]How to use matplotlib to plot pyspark sql results using shell
[英]How to plot using pyspark?
我需要繪制兩個獨立的列:第一個代表數據,第二個代表時間:
All_packets= df.select("ip_adr_src","asn_val","timestamp")
EB_packets=All_packets.filter("asn_val is not NULL")
EB_packets.show()
plotdf=EB_packets.select("asn_val","timestamp")
我想ans_val
通過ip_adr_src per time
繪制ans_val
組。 如果我有6 ip_adr_src
,我希望有6條曲線。
+--------------------+---------------------------------+-------------+
| ip_adr_src |asn_val | timestamp|
+--------------------+---------------------------------+-------------+
|14:15:92:cc:00:01...| 707|1539071748441|
|14:15:92:cc:00:02...| 1212|1539071752314|
|14:15:92:cc:00:00...| 1616|1539071755578|
|14:15:92:cc:00:04...| 1818|1539071757167|
|14:15:92:cc:00:03...| 2020|1539071759297|
|14:15:92:cc:00:00...| 2121|1539071760408|
|14:15:92:cc:00:09...| 2323|1539071764035|
|14:15:92:cc:00:07...| 2424|1539071765775|
|14:15:92:cc:00:00...| 2525|1539071768560|
|14:15:92:cc:00:06...| 5858|1539071845370|
|14:15:92:cc:00:00...| 6060|1539071850129|
|14:15:92:cc:00:05...| 6262|1539071855046|
|14:15:92:cc:00:00...| 6969|1539071872523|
|14:15:92:cc:00:07...| 6969|1539071872528|
|14:15:92:cc:00:08...| 7171|1539071877609|
但是,我所有的測試都是錯誤的,並且我有這個錯誤:
Dataframe doesn't have an object `'plot'`
如果您能幫助我,我將不勝感激。
我不確定我是否了解要繪制的列,但我懷疑您需要有關如何繪制的幫助。 這就是我如何針對一個timestamp
繪制ans_val
列:
import matplotlib.pyplot as plt
y_ans_val = [val.ans_val for val in df.select('ans_val').collect()]
x_ts = [val.timestamp for val in df.select('timestamp').collect()]
plt.plot(x_ts, y_ans_val)
plt.ylabel('ans_val')
plt.xlabel('timestamp')
plt.title('ASN values for time')
plt.legend(['asn_val'], loc='upper left')
plt.show()
如果需要繪制其他列,請plt.plot(x,y)
調用plt.plot(x,y)
命令,並將每個名稱添加到plt.legend(your_cols, loc='upper left')
函數中。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.