繁体   English   中英

如何使用正则表达式从单行 JSON 文本文件中提取 URL?

[英]How can I extract URLs from a one line JSON text file using regex?

我一直在努力解决这个问题,但我似乎只能将一个 URL 导出到输出文件。

我目前使用的代码是...

import glob, re

with open('urls.txt', 'a') as output:
    for file in glob.glob('json.txt'):
        with open(file, 'r') as f:
            for line in f.readlines():
                pattern = r"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])"
                find = re.findall(pattern, line)
                if find:
                    try:
                        output.write(str(find[0]))
                    except UnicodeEncodeError:
                        pass

我已经测试了正则表达式代码,它能够匹配所有的 URL,只是不会将它们全部输出到文件中。

我一直试图从中提取 URL 的文件包含以下内容(为了便于阅读而缩进):

{
    "items": [{
        "schema": "Event",
        "source_id": "99558834",
        "event_id": "7103414757044987314",
        "start_time": "2022-05-30T06:37:10Z",
        "end_time": "2022-05-30T06:37:24Z",
        "event_type": "motion",
        "source_type": null,
        "duration_ms": 14400,
        "session_duration": 14000,
        "state": "timed_out",
        "had_subscription": true,
        "is_favorite": false,
        "recording_status": "ready",
        "cv": {
            "person_detected": true,
            "stream_broken": false,
            "detection_type": "human",
            "cv_triggers": null,
            "detection_types": [{
                "detection_type": "human",
                "verified_timestamps": [1653892632153]
            }]
        },
        "properties": {
            "is_alexa": false,
            "is_sidewalk": false,
            "is_autoreply": false
        },
        "origin": null,
        "error_message": null,
        "updated_at": "2022-05-30T06:37:28.958Z",
        "visualizations": {
            "cloud_media_visualization": {
                "schema": "CloudMediaVisualization",
                "media": [{
                    "schema": "Media",
                    "url": "https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/8cbfaccd-9b1a-458b-88b9-5d12976f4293.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=b42e734ab24ce1c8057038a171c995326de1a8cf219810c33c0d0883e2ea38b2",
                    "custom_metadata": null,
                    "is_e2ee": false,
                    "manifest_id": null,
                    "file_type": "VIDEO",
                    "file_family": "VIDEO",
                    "preroll_duration_ms": 0,
                    "playback_duration": 14000,
                    "source": "Apsara"
                }, {
                    "schema": "Media",
                    "url": "https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/b22b3c85-5de3-4e91-92b5-d91db479df55.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=7724f211de257f1f13fb585158f0f241e47daa1f1f67a3e48527e45883889a8b",
                    "custom_metadata": null,
                    "is_e2ee": false,
                    "manifest_id": null,
                    "file_type": "LQ_VIDEO",
                    "file_family": "LQ_VIDEO",
                    "preroll_duration_ms": 0,
                    "playback_duration": 14400,
                    "source": "Apsara"
                }, {
                    "schema": "Media",
                    "url": "https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/564fb900-0d78-4521-8a3d-b760fff7ee8d.iframe?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=8ccd9823cd6d2fe0e386b843a700bd05cc3a694c6986a55b75c797cbf846b7c6",
                    "custom_metadata": null,
                    "is_e2ee": false,
                    "manifest_id": null,
                    "file_type": "THUMBNAIL",
                    "file_family": "THUMBNAIL",
                    "preroll_duration_ms": 0,
                    "playback_duration": 14000,
                    "source": "Apsara"
                }]
            },
            "local_media_visualization": {
                "schema": "LocalMediaVisualization",
                "media": []
            },
            "radar_visualization": null,
            "single_coordinate_visualization": null,
            "map_visualization": null
        },
        "device": {
            "id": 99558834,
            "description": "Front",
            "type": "cocoa_camera"
        },
        "owner_id": "71616327"
    }]
}

我认为让您拥有有效 JSON 的数据然后使用json.loads()支持的object_hook参数可能会更容易。 有关更多详细信息,请参阅我对如何按键查找特定 JSON 值的回答? .

以下是如何应用您的数据:

import json

def find_values(id, json_repr):
    results = []

    def _decode_dict(a_dict):
        try:
            results.append(a_dict[id])
        except KeyError:
            pass
        return a_dict

    json.loads(json_repr, object_hook=_decode_dict) # Return value ignored.
    return results

with open('filename.json') as file:
    jstr = file.read()

json_repr = jstr + ']}'  # Make jstr valid JSON.

results = find_values('url', json_repr)
print(f'{len(results)} URLs found')
for i, url in enumerate(results, start=1):
    print(f'{i}: {url}')

输出:

3 URLs found
1: https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/8cbfaccd-9b1a-458b-88b9-5d12976f4293.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=b42e734ab24ce1c8057038a171c995326de1a8cf219810c33c0d0883e2ea38b2
2: https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/b22b3c85-5de3-4e91-92b5-d91db479df55.mp4?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=7724f211de257f1f13fb585158f0f241e47daa1f1f67a3e48527e45883889a8b
3: https://filestore-086356611853-us-west-2-prod-data.s3.us-west-2.amazonaws.com/564fb900-0d78-4521-8a3d-b760fff7ee8d.iframe?X-Amz-Security-Token=IQoJb3JpZ2luX2VjECcaCXVzLWVhc3QtMSJIMEYCIQCd%2FiqSm%2BFneYZ1sRxM1yNyc3Cr8bVV92jQRo6k%2B4A7pwIhAO4ufSc2Ol8wevIQBjAUZz%2B7%2B%2FZrSgGpNtDhBH6hWlikKtIECB8QABoMNzM0NDEwMjU5OTMxIgyxJGK4nrZlY0QIGNQqrwTjz9YEN9G7vRk%2Bu9qUDpVIrwzd2jNXuCJ92K%2BHVCpSQb8wFqg6%2Bh521Ukotxvl9HXThrBDfgK4madk3%2FJ1Gynn3M%2BZ7MJnpLu0uA9tUperBazYvaNzPgFWBS2kWSUObSO5Jfwn6L9VoB4D%2F%2FHvOJa5pmDVXFc2s4hSkyxrXfw7W5OoBxdjKPU5TcdamZy7uJgLElZec%2F7PO99okNwIYQDS0RKKpcdZs3VbBiceXeb8ApDIcDWonMrnmz18Gz9wG%2B6ERrM6Av31UXID875c6DqfbqxCxpGpVXBlSy6jQENn%2Bl%2Bc5xewwhY4mTq90CcCZXnebCyoqkr2mt0S3lkZSBxdOI8qnoojCmg7yy%2BFII63h4NKQbEbhm2u1u%2Fb1Ar5UfD4wHzsalhZp83Xej5Lsg0uXvpRCaYoR6mQgvnmVmS1bIFe0StzTHhJHViwEb4XbSK3u5Z%2FniVcBbVKsidNN9%2FA33okRPz7FMjpEaOB3lsbeTpmBcC86GlnwFxarYEvWY6eN7uxE0pzuK2asYgat5JqaNj%2FbRMaW1hi7ivGAj9uFZjMteTdrsNAq6lbLaiL1POhB98D0eJumvA1xu%2FbxoE7VrW%2BikA2LOGwni5EAZ9LIzywxOHx9a5iiC%2BAFjwUGEzswdmzo0mAq0llNp1twfG5Bn47DHrUfF3NubD3aCA01mQ%2FSbKKBv%2BnMD6FK2yo9f8y2Ol%2F12%2FRLQMZbkA6i7TpaE7HNvj3ElWgwUp8OddeMPaD1ZQGOqgB4vDMx4xDOedv0RjNjZikdYtR2dHU3V4K9Ls2qUqF6NJ%2FrbvgwL1s4%2Bm3ZMeOUmLfJDMazkWg8jSNRfKBWFParp2R0%2Fg8TDUEOecwrbmN7cKG3vtnOpZIcFCD46bWvKm9czEun5zbNg6Q1rCLob5RTkEG6H0A729wvomQRldlb6QBtwAC0B7mfnRGgNZrEN3z0SSauZJS3mabSGhxwc0Oem6mFKK6s9Qh&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T225422Z&X-Amz-SignedHeaders=host&X-Amz-Expires=900&X-Amz-Credential=ASIA2V7SDHXNTJG3BKDZ%2F20220530%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=8ccd9823cd6d2fe0e386b843a700bd05cc3a694c6986a55b75c797cbf846b7c6

正如其他人所提到的,有更好的方法来解析/读取 json,但是给定您的代码,它可以通过一个小的调整来完成您想要的。

import glob, re

with open('urls.txt', 'a') as output:
    for file in glob.glob('json.txt'):
        with open(file, 'r') as f:
            for line in f.readlines():
                pattern = r"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])"
                find = re.findall(pattern, line)
                if find:
                    try:
                        for result in find:
                            output.write(str(result) + "\n")
                    except UnicodeEncodeError:
                        pass

您只要求输出第一个匹配的结果( find[0] )。 你想得到所有这些,所以遍历它们然后输出它们。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM