如何从命令行以编程方式检索我的 SO 代表和徽章数量？

Question

Orignal question原始问题

My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico and pipe the result into sed / awk . My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico and pipe the result into sed / awk . However, as I've frequently read, sed and awk are not the best tools to parse HTML code.但是，正如我经常阅读的那样， sed和awk并不是解析 HTML 代码的最佳工具。 Furthermore, the above URL changes if I change my user name.此外，如果我更改我的用户名，上述 URL 会发生变化。

Oh, this is my quick attempt with sed , written on multiple lines for readability:哦，这是我对sed的快速尝试，为了便于阅读，写在多行上：

curl https://stackoverflow.com/users/5825294/enlico 2> /dev/null | sed -nE '
/title="reputation"/,/bronze badges/{
    /"reputation"/{
        N
        N
        s!.*>(.*)</.*!\1!p
    }
/badges/s/.*[^1-9]([1-9]+[0-9]*,*[0-9]* (gold|silver|bronze) badges).*/\1/p
}'

which prints哪个打印

10,968
5 gold badges
27 silver badges
56 bronze badge

Obviously this script heavily relies on the peculiar structure of the specific HTML page, the most notable example being that I run N twice because I've verified that the reputation is two lines below the first line in the file containing "reputation" .显然，这个脚本严重依赖于特定 HTML 页面的特殊结构，最值得注意的例子是我运行N两次，因为我已经验证了信誉是包含"reputation"的文件中第一行下方的两行。

Update based on the answers根据答案更新

Léa Gris' answer almost answers my question. Léa Gris 的回答几乎回答了我的问题。 The missing bit is that I have 5 gold, 27 silver, and 56 bronze badges, not 5, 18, 7.缺少的一点是我有 5 金、27 银和 56 铜徽章，而不是 5、18、7。

In this respect, I've noticed that 18 is the is the number of silver badges I have if I don't consider those awarded multilple times, therefore I've played around with jq and discovered that I can query for the award_count beside the rank , and I thought that I could use that to take multiply awarded badges into account.在这方面，我注意到 18 是我拥有的银徽章的数量，如果我不考虑多次获得的那些，因此我玩弄了jq并发现我可以查询旁边的award_count rank ，我想我可以用它来考虑多次获得的徽章。 This kind of works, in the sense that running the following ( fetch_user_badges is from Léa Gris' answer) generates the correct number of silver badges but the wrong number of bronze badges:这种作品，从某种意义上说，运行以下（ fetch_user_badges来自 Léa Gris 的回答）会生成正确数量的银质徽章，但会生成错误数量的铜质徽章：

$ fetch_user_badges stackoverflow 5825294 | jq -r '
.items
| map({rank: .rank, count: .award_count})
| group_by(.rank)
| map([[.[0].rank],map(.count) | add])'

[
  [
    "bronze",
    22
  ],
  [
    "gold",
    5
  ],
  [
    "silver",
    27
  ]
]

Is anybody aware of why is that?有人知道这是为什么吗？

Answer 1

Full example using StackExchange API and jq for parsing the response.使用 StackExchange API 和 jq 解析响应的完整示例。

#!/usr/bin/env bash

# This script fetches and prints some user info
# from a stack-site using the stackexchange's API

# Change this to the stackoverflow's numerical user ID

STACK_UID=5825294
STACK_SITE='stackoverflow'
STACK_API='https://api.stackexchange.com/2.2'

API_CACHE=~/.cache/stack_api

mkdir -p "$API_CACHE"

# Get a stack-site user using the stackexchange API and caches the result
# @Params:
# $1: the website (example stackoverflow)
# $2: the numerical user ID
# @Output:
# &1: API Json reply
stack_api::user() {
  stack_site=$1
  stack_uid=$2

  cache_file="${API_CACHE}/${stack_site}-users-${stack_uid}.json"

  yesterday_ref="${API_CACHE}/yesterday.ref"
  touch -d yesterday "$yesterday_ref"

  # Expire cache
  [ "$cache_file" -ot "$yesterday_ref" ] && rm -f -- "$cache_file"

  # Call stack API only if no cached answer
  [ -f "$cache_file" ] || curl \
    --silent \
    --output "$cache_file" \
    --request GET \
    --url "${STACK_API}/users/${stack_uid}?site=${stack_site}"

  # Return cached answer
  zcat --force -- "$cache_file" 2>/dev/null
}

IFS=$'\n' read -r -d '' username reputation bronze silver gold < <(
  # Fetch user from a stack site
  stack_api::user "$STACK_SITE" "$STACK_UID" |

  # Parse the stack_api user data from the JSON response
  jq -r '
.items[0] |
  .display_name,
  .reputation,
  ( .badge_counts |
    .bronze,
    .silver,
    .gold
  )
  '
)

printf 'Badges from UserID %d %s on the %s website:\n\n' \
  $STACK_UID "$username" "$STACK_SITE"
printf 'Réputation: %6d\n' "$reputation"
printf 'Bronze:     %6d\n' "$bronze"
printf 'Silver:     %6d\n' "$silver"
printf 'Gold:       %6d\n' "$gold"

Example output:示例 output：

Badges from UserID 5825294 Enlico on the stackoverflow website:

Reputation:  11144
Bronze:         56
Silver:         27
Gold:            5

Answer 2

as I've frequently read, sed and awk are not the best tools to parse HTML code.正如我经常阅读的那样， sed和awk并不是解析 HTML 代码的最佳工具。

That's right.这是正确的。 Instead of repeating what others already have said, I'd say;我不会重复其他人已经说过的话，而是说； have a look at:看一下：

Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms为什么不能使用正则表达式来解析 HTML/XML：通俗易懂的正式解释
RegEx match open tags except XHTML self-contained tags RegEx 匹配打开的标签，XHTML 自包含标签除外
How do I extract data from an HTML or XML file?如何从 HTML 或 XML 文件中提取数据？

Too bad that last website is rather outdated, because to parse an HTML-source I would pick the Swiss knife tool xidel anytime!太糟糕了，最后一个网站已经过时了，因为要解析 HTML 源，我会随时选择瑞士刀工具xidel ！

HTML-source HTML 源代码

The information you wish to extract sits within the first <span class="profile-communities--rep-badges"> -node:您希望提取的信息位于第一个<span class="profile-communities--rep-badges"> -node 中：

$ xidel -s "https://stackoverflow.com/users/5825294" -e '
  (//span[@class="profile-communities--rep-badges"])[1]
' --printed-node-format=html
<span class="profile-communities--rep-badges">
                                <strong class="ml6 fc-medium" title="11,144 reputation">11.1k</strong>
                                <span title="5 gold badges" aria-hidden="true"><span class="badge1"></span><span class="badgecount">5</span></span><span class="v-visible-sr">5 gold badges</span><span title="27 silver badges" aria-hidden="true"><span class="badge2"></span><span class="badgecount">27</span></span><span class="v-visible-sr">27 silver badges</span><span title="56 bronze badges" aria-hidden="true"><span class="badge3"></span><span class="badgecount">56</span></span><span class="v-visible-sr">56 bronze badges</span>
                            </span>

Earlier Jack Fleeting made a good point on the possibility of positional selectors being more unreliable than node-names or their attribute(-values).早些时候，Jack Fleeting 很好地指出了位置选择器比节点名称或其属性（-值）更不可靠的可能性。

In that case trying to search for a uniquely identifiable parent-node is another way to get the same information:在这种情况下，尝试搜索唯一可识别的父节点是获取相同信息的另一种方法：

$ xidel -s "https://stackoverflow.com/users/5825294" -e '
  //a[@title="Stack Overflow"]/span[@class="profile-communities--rep-badges"]
' --printed-node-format=html

( //a[@href="https://stackoverflow.com/users/5825294/"] would also work) （ //a[@href="https://stackoverflow.com/users/5825294/"]也可以）

Then the easiest way to grab the specific information you want is to select the value of all the descendant title-attributes ( //@title ):那么获取你想要的具体信息的最简单方法是 select 的所有后代 title-attributes 的值（ //@title ）：

$ xidel -s "https://stackoverflow.com/users/5825294" -e '
  //a[@title="Stack Overflow"]/span[@class="profile-communities--rep-badges"]//@title
'
12,809 reputation
5 gold badges
28 silver badges
60 bronze badges

Furthermore, the above URL changes if I change my user name.此外，如果我更改我的用户名，上述 URL 会发生变化。

As you can see, "https://stackoverflow.com/users/5825294" works too.如您所见， "https://stackoverflow.com/users/5825294"也有效。

StackExchange API StackExchange API

The same Swiss knife tool is also a JSON parser:同样的瑞士刀工具也是一个JSON解析器：

$ xidel -s "https://api.stackexchange.com/2.2/users/5825294?site=stackoverflow" -e '
  $json/(items)()/(
    reputation||" reputation",
    for $x in reverse((badge_counts)()) return
    join(((badge_counts)($x),$x,"badges"))
  )
'
12809 reputation
5 gold badges
28 silver badges
60 bronze badges

Also see this Xidel online tester for (alternative) intermediate steps.另请参阅此 Xidel 在线测试仪了解（替代）中间步骤。

Answer 3

There are few ways of doing that;这样做的方法很少； I personally prefer using xpath with a tool like xidel (although you can also use xmlstarlet, etc.)我个人更喜欢将 xpath 与 xidel 之类的工具一起使用（虽然你也可以使用 xmlstarlet 等）

You can get your reputation score using您可以使用

xidel https://stackoverflow.com/users/5825294/enlico  -e "//div[@title='reputation']/div/div[@class='grid--cell fs-title fc-dark']/text()"

Similarly, the number of gold medals is obtained using:同样，金牌数量通过以下方式获得：

xidel https://stackoverflow.com/users/5825294/enlico  -e "//div[@class='grid ai-center s-badge s-badge__gold']//span[@class='grid grid__center fl1']/text()"

Changing the string gold to silver or bronze in that second xpath expression will get you the other two categories.在第二个 xpath 表达式中将字符串gold更改为silver或bronze将为您提供其他两个类别。

Answer 4

the age-old wisdom is do not parse HTML with regex , how about古老的智慧是不要用正则表达式解析 HTML ，怎么样

curl https://stackoverflow.com/users/5825294/enlico -s | php -r '$d=new DOMDocument();@$d->loadHTML(stream_get_contents(STDIN));$xp=new DOMXPath($d);foreach($xp->query("//*[@id=\"user-card\"]//*[contains(@title,\"badges\")]") as $foo){echo $foo->getAttribute("title"),PHP_EOL;}echo preg_replace("/\\s+/"," ",$xp->query("//*[@title=\"reputation\"]")->item(0)->textContent);'

5 gold badges
27 silver badges
56 bronze badges
 11,144 reputation

... ...

如何从命令行以编程方式检索我的 SO 代表和徽章数量？

问题描述

Orignal question原始问题

Update based on the answers根据答案更新

4 个解决方案

解决方案1
2 已采纳 2021-04-07 17:44:02

解决方案2
2 2021-04-12 21:55:58

HTML-source HTML 源代码

StackExchange API StackExchange API

解决方案3
1 2021-04-07 11:19:10

解决方案4
0 2021-04-12 23:36:50

如何从命令行以编程方式检索我的 SO 代表和徽章数量？

问题描述

Orignal question原始问题

Update based on the answers根据答案更新

4 个解决方案

解决方案1 2 已采纳 2021-04-07 17:44:02

解决方案2 2 2021-04-12 21:55:58

HTML-source HTML 源代码

StackExchange API StackExchange API

解决方案3 1 2021-04-07 11:19:10

解决方案4 0 2021-04-12 23:36:50

解决方案1
2 已采纳 2021-04-07 17:44:02

解决方案2
2 2021-04-12 21:55:58

解决方案3
1 2021-04-07 11:19:10

解决方案4
0 2021-04-12 23:36:50