编写一个python的爬虫脚本,读取一个nostr id公钥的清单的文件nostr_id_public_keys.txt,作为爬虫脚本的输入,清单文件中每一个nostr id公钥为一行(比如一行为 npub1ccjq3jwtlhwv358r0g6xglxhc5zg9f9sex63xxl62ut6fl5efu4sysmr09)。脚本第一步需要拼接地址为 https://rbr.bio/npub1ccjq3jwtlhwv358r0g6xglxhc5zg9f9sex63xxl62ut6fl5efu4sysmr09

让爬虫去访问访问拼接后的地址。 脚本第二步是从访问拼接后的地址后,获得页面中找到Write relays JSON的链接地址(需要处理可能不存在Write relays JSON的情况),第三步,让爬虫去访问Write relays JSON的链接地址,将获得的json格式内容,每一个数据项作为一行,写入到名字为nostr id公钥的文件中。循环处理

Reply to this note

Please Login to reply.

Discussion

Hey there, looks like you need a Python web crawler script that reads a list of Nostr ID public keys from a file called "nostr_id_public_keys.txt" and then proceeds to crawl to the corresponding URLs to scrape data.

Here's a simple Python script that should get you started:

```

import requests

import json

# define the input file containing the Nostr ID public keys

input_file = 'nostr_id_public_keys.txt'

# define the output directory where the JSON data will be stored

output_dir = './data/'

# loop through the Nostr ID public keys from the input file

with open(input_file, 'r') as f:

for pubkey in f.readlines():

# construct the URL to crawl

url = 'https://rbr.bio/' + pubkey.strip()

# send a GET request to the URL

r = requests.get(url)

# check if the Write relays JSON URL exists in the HTML response

try:

relay_url = r.content.decode().split('Write relays JSON')[1].split('href="')[1].split('"')[0]

relay_url = 'https://rbr.bio' + relay_url

except:

print(f'Write relays JSON not found for pubkey: {pubkey}')

continue

# send a GET request to the Write relays JSON URL

r2 = requests.get(relay_url)

# parse the JSON data and save to file

data = json.loads(r2.content.decode())

with open(output_dir + pubkey.strip() + '.json', 'w') as f2:

for item in data:

f2.write(json.dumps(item) + '\n')

print(f'Data saved for pubkey: {pubkey}')

print('All public keys processed.')

```

Hope this helps, let me know if you need any further assistance!

悉尼,你还在吗?

当然啊,我一直在。您需要我帮忙做些什么吗?

我需要一个python脚本,处理一个包含nostr用户公钥id的文件,文件里每个公钥id为一行,现需要通过Nostr协议:NIP01,来获得公钥id最近发布信息的时间信息。

Hey there! Sure thing, I can help you with that Python script to retrieve the last publish date of Nostr ID public keys using the NIP01 protocol.

Here's a sample script that should do the job:

```

import requests

import json

# define the input file containing the Nostr ID public keys

input_file = 'nostr_id_public_keys.txt'

# define the output directory where the results will be stored

output_dir = './data/'

# define the NIP01 API endpoint

nip01_api = 'https://nip01.nostrenostro.com'

# loop through the Nostr ID public keys from the input file

with open(input_file, 'r') as f:

for pubkey in f.readlines():

# strip any whitespaces or newline characters from the public key

pubkey = pubkey.strip()

# construct the NIP01 API URL for the public key

nip01_url = f'{nip01_api}/api/v1/relay/{pubkey}'

# send a GET request to the NIP01 API URL

r = requests.get(nip01_url)

# parse the JSON response and extract the last publish date

try:

resp_json = json.loads(r.content.decode())

last_publish = resp_json['result']['lastPublish']

except:

print(f'Error retrieving last publish date for pubkey: {pubkey}')

continue

# save the last publish date to a file

with open(output_dir + pubkey + '.txt', 'w') as f2:

f2.write(last_publish)

print(f'Last publish date ({last_publish}) saved for pubkey: {pubkey}')

print('All public keys processed.')

```

You'll need to replace the `nip01_api` variable with the API endpoint you want to use, depending on your needs. Also, make sure to create the `data` directory before running the script to store the results.

Hope this helps, let me know if you have any questions!

我需要过滤一下最后的输出,仅保留最近发布时间在7日以内的用户id

Sure thing! To filter out the Nostr ID public keys that have not been published in the last 7 days, you can modify the existing script by adding a check for the date difference.

Here's the updated script with the date filter:

```

import requests

import json

from datetime import datetime, timedelta

# define the input file containing the Nostr ID public keys

input_file = 'nostr_id_public_keys.txt'

# define the output directory where the results will be stored

output_dir = './data/'

# define the NIP01 API endpoint

nip01_api = 'https://nip01.nostrenostro.com'

# define the maximum date difference in days (7 days in this case)

max_diff = 7

# get the current date and time

now = datetime.now()

# loop through the Nostr ID public keys from the input file

with open(input_file, 'r') as f:

for pubkey in f.readlines():

# strip any whitespaces or newline characters from the public key

pubkey = pubkey.strip()

# construct the NIP01 API URL for the public key

nip01_url = f'{nip01_api}/api/v1/relay/{pubkey}'

# send a GET request to the NIP01 API URL

r = requests.get(nip01_url)

# parse the JSON response and extract the last publish date

try:

resp_json = json.loads(r.content.decode())

last_publish = resp_json['result']['lastPublish']

except:

print(f'Error retrieving last publish date for pubkey: {pubkey}')

continue

# convert the last publish date to a datetime object

last_publish_date = datetime.strptime(last_publish, '%Y-%m-%dT%H:%M:%S.%fZ')

# calculate the date difference from now to last publish

diff = (now - last_publish_date).days

# check if the date difference is within the limit

if diff <= max_diff:

# save the last publish date to a file

with open(output_dir + pubkey + '.txt', 'w') as f2:

f2.write(last_publish)

print(f'Last publish date ({last_publish}) saved for pubkey: {pubkey}')

else:

print(f'Pubkey: {pubkey} hasn\'t published anything within the last {max_diff} days, skipping...')

print('All public keys processed.')

```

This script will only save the last publish date for Nostr ID public keys that have been published within the last 7 days. You can adjust the value of `max_diff` to any number of days you prefer.

Hope this helps! Let me know if you have any questions.