[英]How can I retrieve programmatically from command line my SO rep and number of badges?
My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico
and pipe the result into sed
/ awk
. My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico
and pipe the result into sed
/ awk
. However, as I've frequently read, sed
and awk
are not the best tools to parse HTML code.但是,正如我经常阅读的那样, sed
和awk
并不是解析 HTML 代码的最佳工具。 Furthermore, the above URL changes if I change my user name.此外,如果我更改我的用户名,上述 URL 会发生变化。
Oh, this is my quick attempt with sed
, written on multiple lines for readability:哦,这是我对sed
的快速尝试,为了便于阅读,写在多行上:
curl https://stackoverflow.com/users/5825294/enlico 2> /dev/null | sed -nE '
/title="reputation"/,/bronze badges/{
/"reputation"/{
N
N
s!.*>(.*)</.*!\1!p
}
/badges/s/.*[^1-9]([1-9]+[0-9]*,*[0-9]* (gold|silver|bronze) badges).*/\1/p
}'
which prints哪个打印
10,968
5 gold badges
27 silver badges
56 bronze badge
Obviously this script heavily relies on the peculiar structure of the specific HTML page, the most notable example being that I run N
twice because I've verified that the reputation is two lines below the first line in the file containing "reputation"
.显然,这个脚本严重依赖于特定 HTML 页面的特殊结构,最值得注意的例子是我运行N
两次,因为我已经验证了信誉是包含"reputation"
的文件中第一行下方的两行。
Léa Gris' answer almost answers my question. Léa Gris 的回答几乎回答了我的问题。 The missing bit is that I have 5 gold, 27 silver, and 56 bronze badges, not 5, 18, 7.缺少的一点是我有 5 金、27 银和 56 铜徽章,而不是 5、18、7。
In this respect, I've noticed that 18 is the is the number of silver badges I have if I don't consider those awarded multilple times, therefore I've played around with jq
and discovered that I can query for the award_count
beside the rank
, and I thought that I could use that to take multiply awarded badges into account.在这方面,我注意到 18 是我拥有的银徽章的数量,如果我不考虑多次获得的那些,因此我玩弄了jq
并发现我可以查询旁边的award_count
rank
,我想我可以用它来考虑多次获得的徽章。 This kind of works, in the sense that running the following ( fetch_user_badges
is from Léa Gris' answer) generates the correct number of silver badges but the wrong number of bronze badges:这种作品,从某种意义上说,运行以下( fetch_user_badges
来自 Léa Gris 的回答)会生成正确数量的银质徽章,但会生成错误数量的铜质徽章:
$ fetch_user_badges stackoverflow 5825294 | jq -r '
.items
| map({rank: .rank, count: .award_count})
| group_by(.rank)
| map([[.[0].rank],map(.count) | add])'
[
[
"bronze",
22
],
[
"gold",
5
],
[
"silver",
27
]
]
Is anybody aware of why is that?有人知道这是为什么吗?
Full example using StackExchange API and jq for parsing the response.使用 StackExchange API 和 jq 解析响应的完整示例。
#!/usr/bin/env bash
# This script fetches and prints some user info
# from a stack-site using the stackexchange's API
# Change this to the stackoverflow's numerical user ID
STACK_UID=5825294
STACK_SITE='stackoverflow'
STACK_API='https://api.stackexchange.com/2.2'
API_CACHE=~/.cache/stack_api
mkdir -p "$API_CACHE"
# Get a stack-site user using the stackexchange API and caches the result
# @Params:
# $1: the website (example stackoverflow)
# $2: the numerical user ID
# @Output:
# &1: API Json reply
stack_api::user() {
stack_site=$1
stack_uid=$2
cache_file="${API_CACHE}/${stack_site}-users-${stack_uid}.json"
yesterday_ref="${API_CACHE}/yesterday.ref"
touch -d yesterday "$yesterday_ref"
# Expire cache
[ "$cache_file" -ot "$yesterday_ref" ] && rm -f -- "$cache_file"
# Call stack API only if no cached answer
[ -f "$cache_file" ] || curl \
--silent \
--output "$cache_file" \
--request GET \
--url "${STACK_API}/users/${stack_uid}?site=${stack_site}"
# Return cached answer
zcat --force -- "$cache_file" 2>/dev/null
}
IFS=$'\n' read -r -d '' username reputation bronze silver gold < <(
# Fetch user from a stack site
stack_api::user "$STACK_SITE" "$STACK_UID" |
# Parse the stack_api user data from the JSON response
jq -r '
.items[0] |
.display_name,
.reputation,
( .badge_counts |
.bronze,
.silver,
.gold
)
'
)
printf 'Badges from UserID %d %s on the %s website:\n\n' \
$STACK_UID "$username" "$STACK_SITE"
printf 'Réputation: %6d\n' "$reputation"
printf 'Bronze: %6d\n' "$bronze"
printf 'Silver: %6d\n' "$silver"
printf 'Gold: %6d\n' "$gold"
Example output:示例 output:
Badges from UserID 5825294 Enlico on the stackoverflow website:
Reputation: 11144
Bronze: 56
Silver: 27
Gold: 5
as I've frequently read,
sed
andawk
are not the best tools to parse HTML code.正如我经常阅读的那样,sed
和awk
并不是解析 HTML 代码的最佳工具。
That's right.这是正确的。 Instead of repeating what others already have said, I'd say;我不会重复其他人已经说过的话,而是说; have a look at:看一下:
Too bad that last website is rather outdated, because to parse an HTML-source I would pick the Swiss knife tool xidel anytime!太糟糕了,最后一个网站已经过时了,因为要解析 HTML 源,我会随时选择瑞士刀工具xidel !
The information you wish to extract sits within the first <span class="profile-communities--rep-badges">
-node:您希望提取的信息位于第一个<span class="profile-communities--rep-badges">
-node 中:
$ xidel -s "https://stackoverflow.com/users/5825294" -e '
(//span[@class="profile-communities--rep-badges"])[1]
' --printed-node-format=html
<span class="profile-communities--rep-badges">
<strong class="ml6 fc-medium" title="11,144 reputation">11.1k</strong>
<span title="5 gold badges" aria-hidden="true"><span class="badge1"></span><span class="badgecount">5</span></span><span class="v-visible-sr">5 gold badges</span><span title="27 silver badges" aria-hidden="true"><span class="badge2"></span><span class="badgecount">27</span></span><span class="v-visible-sr">27 silver badges</span><span title="56 bronze badges" aria-hidden="true"><span class="badge3"></span><span class="badgecount">56</span></span><span class="v-visible-sr">56 bronze badges</span>
</span>
Earlier Jack Fleeting made a good point on the possibility of positional selectors being more unreliable than node-names or their attribute(-values).早些时候,Jack Fleeting 很好地指出了位置选择器比节点名称或其属性(-值)更不可靠的可能性。
In that case trying to search for a uniquely identifiable parent-node is another way to get the same information:在这种情况下,尝试搜索唯一可识别的父节点是获取相同信息的另一种方法:
$ xidel -s "https://stackoverflow.com/users/5825294" -e '
//a[@title="Stack Overflow"]/span[@class="profile-communities--rep-badges"]
' --printed-node-format=html
( //a[@href="https://stackoverflow.com/users/5825294/"]
would also work) ( //a[@href="https://stackoverflow.com/users/5825294/"]
也可以)
Then the easiest way to grab the specific information you want is to select the value of all the descendant title-attributes ( //@title
):那么获取你想要的具体信息的最简单方法是 select 的所有后代 title-attributes 的值( //@title
):
$ xidel -s "https://stackoverflow.com/users/5825294" -e '
//a[@title="Stack Overflow"]/span[@class="profile-communities--rep-badges"]//@title
'
12,809 reputation
5 gold badges
28 silver badges
60 bronze badges
Furthermore, the above URL changes if I change my user name.此外,如果我更改我的用户名,上述 URL 会发生变化。
As you can see, "https://stackoverflow.com/users/5825294"
works too.如您所见, "https://stackoverflow.com/users/5825294"
也有效。
The same Swiss knife tool is also a JSON parser:同样的瑞士刀工具也是一个JSON解析器:
$ xidel -s "https://api.stackexchange.com/2.2/users/5825294?site=stackoverflow" -e '
$json/(items)()/(
reputation||" reputation",
for $x in reverse((badge_counts)()) return
join(((badge_counts)($x),$x,"badges"))
)
'
12809 reputation
5 gold badges
28 silver badges
60 bronze badges
Also see this Xidel online tester for (alternative) intermediate steps.另请参阅此 Xidel 在线测试仪了解(替代)中间步骤。
There are few ways of doing that;这样做的方法很少; I personally prefer using xpath with a tool like xidel (although you can also use xmlstarlet, etc.)我个人更喜欢将 xpath 与 xidel 之类的工具一起使用(虽然你也可以使用 xmlstarlet 等)
You can get your reputation score using您可以使用
xidel https://stackoverflow.com/users/5825294/enlico -e "//div[@title='reputation']/div/div[@class='grid--cell fs-title fc-dark']/text()"
Similarly, the number of gold medals is obtained using:同样,金牌数量通过以下方式获得:
xidel https://stackoverflow.com/users/5825294/enlico -e "//div[@class='grid ai-center s-badge s-badge__gold']//span[@class='grid grid__center fl1']/text()"
Changing the string gold
to silver
or bronze
in that second xpath expression will get you the other two categories.在第二个 xpath 表达式中将字符串gold
更改为silver
或bronze
将为您提供其他两个类别。
the age-old wisdom is do not parse HTML with regex , how about古老的智慧是不要用正则表达式解析 HTML ,怎么样
curl https://stackoverflow.com/users/5825294/enlico -s | php -r '$d=new DOMDocument();@$d->loadHTML(stream_get_contents(STDIN));$xp=new DOMXPath($d);foreach($xp->query("//*[@id=\"user-card\"]//*[contains(@title,\"badges\")]") as $foo){echo $foo->getAttribute("title"),PHP_EOL;}echo preg_replace("/\\s+/"," ",$xp->query("//*[@title=\"reputation\"]")->item(0)->textContent);'
5 gold badges
27 silver badges
56 bronze badges
11,144 reputation
... ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.