简体   繁体   English

如何从命令行以编程方式检索我的 SO 代表和徽章数量?

[英]How can I retrieve programmatically from command line my SO rep and number of badges?

Orignal question原始问题

My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico and pipe the result into sed / awk . My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico and pipe the result into sed / awk . However, as I've frequently read, sed and awk are not the best tools to parse HTML code.但是,正如我经常阅读的那样, sedawk并不是解析 HTML 代码的最佳工具。 Furthermore, the above URL changes if I change my user name.此外,如果我更改我的用户名,上述 URL 会发生变化。

Oh, this is my quick attempt with sed , written on multiple lines for readability:哦,这是我对sed的快速尝试,为了便于阅读,写在多行上:

curl https://stackoverflow.com/users/5825294/enlico 2> /dev/null | sed -nE '
/title="reputation"/,/bronze badges/{
    /"reputation"/{
        N
        N
        s!.*>(.*)</.*!\1!p
    }
/badges/s/.*[^1-9]([1-9]+[0-9]*,*[0-9]* (gold|silver|bronze) badges).*/\1/p
}'

which prints哪个打印

10,968
5 gold badges
27 silver badges
56 bronze badge

Obviously this script heavily relies on the peculiar structure of the specific HTML page, the most notable example being that I run N twice because I've verified that the reputation is two lines below the first line in the file containing "reputation" .显然,这个脚本严重依赖于特定 HTML 页面的特殊结构,最值得注意的例子是我运行N两次,因为我已经验证了信誉是包含"reputation"的文件中第一行下方的两行。

Update based on the answers根据答案更新

Léa Gris' answer almost answers my question. Léa Gris 的回答几乎回答了我的问题。 The missing bit is that I have 5 gold, 27 silver, and 56 bronze badges, not 5, 18, 7.缺少的一点是我有 5 金、27 银和 56 铜徽章,而不是 5、18、7。

In this respect, I've noticed that 18 is the is the number of silver badges I have if I don't consider those awarded multilple times, therefore I've played around with jq and discovered that I can query for the award_count beside the rank , and I thought that I could use that to take multiply awarded badges into account.在这方面,我注意到 18 是我拥有的银徽章的数量,如果我不考虑多次获得的那些,因此我玩弄了jq并发现我可以查询旁边的award_count rank ,我想我可以用它来考虑多次获得的徽章。 This kind of works, in the sense that running the following ( fetch_user_badges is from Léa Gris' answer) generates the correct number of silver badges but the wrong number of bronze badges:这种作品,从某种意义上说,运行以下( fetch_user_badges来自 Léa Gris 的回答)会生成正确数量的银质徽章,但会生成错误数量的铜质徽章:

$ fetch_user_badges stackoverflow 5825294 | jq -r '
.items
| map({rank: .rank, count: .award_count})
| group_by(.rank)
| map([[.[0].rank],map(.count) | add])'
[
  [
    "bronze",
    22
  ],
  [
    "gold",
    5
  ],
  [
    "silver",
    27
  ]
]

Is anybody aware of why is that?有人知道这是为什么吗?

Full example using StackExchange API and jq for parsing the response.使用 StackExchange API 和 jq 解析响应的完整示例。

#!/usr/bin/env bash

# This script fetches and prints some user info
# from a stack-site using the stackexchange's API

# Change this to the stackoverflow's numerical user ID

STACK_UID=5825294
STACK_SITE='stackoverflow'
STACK_API='https://api.stackexchange.com/2.2'

API_CACHE=~/.cache/stack_api

mkdir -p "$API_CACHE"

# Get a stack-site user using the stackexchange API and caches the result
# @Params:
# $1: the website (example stackoverflow)
# $2: the numerical user ID
# @Output:
# &1: API Json reply
stack_api::user() {
  stack_site=$1
  stack_uid=$2

  cache_file="${API_CACHE}/${stack_site}-users-${stack_uid}.json"

  yesterday_ref="${API_CACHE}/yesterday.ref"
  touch -d yesterday "$yesterday_ref"

  # Expire cache
  [ "$cache_file" -ot "$yesterday_ref" ] && rm -f -- "$cache_file"

  # Call stack API only if no cached answer
  [ -f "$cache_file" ] || curl \
    --silent \
    --output "$cache_file" \
    --request GET \
    --url "${STACK_API}/users/${stack_uid}?site=${stack_site}"

  # Return cached answer
  zcat --force -- "$cache_file" 2>/dev/null
}

IFS=$'\n' read -r -d '' username reputation bronze silver gold < <(
  # Fetch user from a stack site
  stack_api::user "$STACK_SITE" "$STACK_UID" |

  # Parse the stack_api user data from the JSON response
  jq -r '
.items[0] |
  .display_name,
  .reputation,
  ( .badge_counts |
    .bronze,
    .silver,
    .gold
  )
  '
)

printf 'Badges from UserID %d %s on the %s website:\n\n' \
  $STACK_UID "$username" "$STACK_SITE"
printf 'Réputation: %6d\n' "$reputation"
printf 'Bronze:     %6d\n' "$bronze"
printf 'Silver:     %6d\n' "$silver"
printf 'Gold:       %6d\n' "$gold"

Example output:示例 output:

Badges from UserID 5825294 Enlico on the stackoverflow website:

Reputation:  11144
Bronze:         56
Silver:         27
Gold:            5

as I've frequently read, sed and awk are not the best tools to parse HTML code.正如我经常阅读的那样, sedawk并不是解析 HTML 代码的最佳工具。

That's right.这是正确的。 Instead of repeating what others already have said, I'd say;我不会重复其他人已经说过的话,而是说; have a look at:看一下:

Too bad that last website is rather outdated, because to parse an HTML-source I would pick the Swiss knife tool anytime!太糟糕了,最后一个网站已经过时了,因为要解析 HTML 源,我会随时选择瑞士刀工具

HTML-source HTML 源代码

The information you wish to extract sits within the first <span class="profile-communities--rep-badges"> -node:您希望提取的信息位于第一个<span class="profile-communities--rep-badges"> -node 中:

$ xidel -s "https://stackoverflow.com/users/5825294" -e '
  (//span[@class="profile-communities--rep-badges"])[1]
' --printed-node-format=html
<span class="profile-communities--rep-badges">
                                <strong class="ml6 fc-medium" title="11,144 reputation">11.1k</strong>
                                <span title="5 gold badges" aria-hidden="true"><span class="badge1"></span><span class="badgecount">5</span></span><span class="v-visible-sr">5 gold badges</span><span title="27 silver badges" aria-hidden="true"><span class="badge2"></span><span class="badgecount">27</span></span><span class="v-visible-sr">27 silver badges</span><span title="56 bronze badges" aria-hidden="true"><span class="badge3"></span><span class="badgecount">56</span></span><span class="v-visible-sr">56 bronze badges</span>
                            </span>

Earlier Jack Fleeting made a good point on the possibility of positional selectors being more unreliable than node-names or their attribute(-values).早些时候,Jack Fleeting 很好地指出了位置选择器比节点名称或其属性(-值)更不可靠的可能性。

In that case trying to search for a uniquely identifiable parent-node is another way to get the same information:在这种情况下,尝试搜索唯一可识别的父节点是获取相同信息的另一种方法:

$ xidel -s "https://stackoverflow.com/users/5825294" -e '
  //a[@title="Stack Overflow"]/span[@class="profile-communities--rep-badges"]
' --printed-node-format=html

( //a[@href="https://stackoverflow.com/users/5825294/"] would also work) //a[@href="https://stackoverflow.com/users/5825294/"]也可以)

Then the easiest way to grab the specific information you want is to select the value of all the descendant title-attributes ( //@title ):那么获取你想要的具体信息的最简单方法是 select 的所有后代 title-attributes 的值( //@title ):

$ xidel -s "https://stackoverflow.com/users/5825294" -e '
  //a[@title="Stack Overflow"]/span[@class="profile-communities--rep-badges"]//@title
'
12,809 reputation
5 gold badges
28 silver badges
60 bronze badges

Furthermore, the above URL changes if I change my user name.此外,如果我更改我的用户名,上述 URL 会发生变化。

As you can see, "https://stackoverflow.com/users/5825294" works too.如您所见, "https://stackoverflow.com/users/5825294"也有效。

StackExchange API StackExchange API

The same Swiss knife tool is also a JSON parser:同样的瑞士刀工具也是一个JSON解析器:

$ xidel -s "https://api.stackexchange.com/2.2/users/5825294?site=stackoverflow" -e '
  $json/(items)()/(
    reputation||" reputation",
    for $x in reverse((badge_counts)()) return
    join(((badge_counts)($x),$x,"badges"))
  )
'
12809 reputation
5 gold badges
28 silver badges
60 bronze badges

Also see this Xidel online tester for (alternative) intermediate steps.另请参阅此 Xidel 在线测试仪了解(替代)中间步骤。

There are few ways of doing that;这样做的方法很少; I personally prefer using xpath with a tool like xidel (although you can also use xmlstarlet, etc.)我个人更喜欢将 xpath 与 xidel 之类的工具一起使用(虽然你也可以使用 xmlstarlet 等)

You can get your reputation score using您可以使用

xidel https://stackoverflow.com/users/5825294/enlico  -e "//div[@title='reputation']/div/div[@class='grid--cell fs-title fc-dark']/text()"

Similarly, the number of gold medals is obtained using:同样,金牌数量通过以下方式获得:

xidel https://stackoverflow.com/users/5825294/enlico  -e "//div[@class='grid ai-center s-badge s-badge__gold']//span[@class='grid grid__center fl1']/text()"

Changing the string gold to silver or bronze in that second xpath expression will get you the other two categories.在第二个 xpath 表达式中将字符串gold更改为silverbronze将为您提供其他两个类别。

the age-old wisdom is do not parse HTML with regex , how about古老的智慧是不要用正则表达式解析 HTML ,怎么样

curl https://stackoverflow.com/users/5825294/enlico -s | php -r '$d=new DOMDocument();@$d->loadHTML(stream_get_contents(STDIN));$xp=new DOMXPath($d);foreach($xp->query("//*[@id=\"user-card\"]//*[contains(@title,\"badges\")]") as $foo){echo $foo->getAttribute("title"),PHP_EOL;}echo preg_replace("/\\s+/"," ",$xp->query("//*[@title=\"reputation\"]")->item(0)->textContent);'

5 gold badges
27 silver badges
56 bronze badges
 11,144 reputation

... ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对齐按钮,使它们保持在同一行? - How can I align my buttons so that they stay in the same line? 如何从我的 firebase 数据库中检索数据? - How can I retrieve data from my firebase database? 如何在纯JavaScript中从&#39;contenteditable&#39;到变量逐行检索文本,不使用jquery - How can I retrieve text, line by line, from 'contenteditable' to variables in pure javascript, no jquery 如何修复 li 元素的样式,使其显示在同一行上? - How can I fix the styling of my li element so it appears on same line? 如何排列我的列表/ pargraph以便它们与html中的图像匹配? - How can I line up my lists/pargraph so they match the image in html? 我不能在我的网站中使用徽章和标签 - I cannot use badges and labels in my website 如何使用笔触-dasharray为SVG线设置动画,以使其从右向左移动 - How can I animate an SVG line, using stroke-dasharray, so that it moves from right to left 如何从我的 pages.ASPX 检索查询字符串到我的 HTML 文件脚本? - How can I retrieve query string from my webpage .ASPX to my HTML file script? 如何将我的号码验证从表单更改为提示? - How can I change my number validation from a form to a prompt? 如何将数组中的图像数据插入 canvas 以便下载? - How do I insert my image data from array into canvas so that I can download it?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM